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Foreword 


Genetic improvement programs of plants and livestock are aimed at maximizing the 
rate of increase of some merit function (e.g., economic value of a wheat line) that is 
expected to have a genetic basis. Typically, candidates for selection with the highest 
merit are kept as parents of the subsequent generation and those with the lowest 
merit are eliminated (“culled”) or used less intensively. There are at least two key 
questions associated with this endeavor: how merit is defined and how it is assessed. 

Merit can be represented by a linear or nonlinear function of genetic values for 
several traits regarded as important from the perspective of producing economic 
returns or benefits. The genetic component of merit cannot be observed; thus, it must 
be inferred from data on the candidates for selection, or on their relatives. Hence, and 
apart from the issue of specifying economic values (an area requiring expertise 
beyond animal and plant breeding), the problem of inferring merit is a largely 
Statistical one. 

This book represents a substantial compilation of work done in an area known as 
“selection indices” in animal and plant breeding. Selection indices were originally 
developed by Smith (1936) in plant breeding and by Hazel (1943) in animal breeding 
to address the selection of plants or animals scored for multiple attributes. In 
agriculture, the breeding worth (or net genetic merit) of a candidate for selection 
depends on several traits. For example, milk production and composition, health, 
reproductive performance, and life-span in dairy cows; and grain yield, disease 
resistance, and flowering time in maize. Smith (1936) defined a linear merit function 


t 
in which the “merit” (Н, say) of a candidate was expressed as Н = 5 wig;, where 
i-l 
t is the number of traits, g; is the unobservable additive genetic value (breeding 
value) of the candidate for trait i, and w; is the relative economic value of trait 
i (calculated externally and taken as a known quantity); in vector notation, H — w'g, 
where w and g are t x 1 vectors of relative economic values and breeding values 
respectively. The preceding definition of H implies that the rate of increase of merit 
rises by w; units as the breeding value for trait і rises by one unit; thus, it is somewhat 
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naive, as it does not contemplate diminishing returns, nonlinearity, or situations in 
which the economic return from increasing trait 1, say, depends on the genetic level 
for trait 2. 

The book contains a wealth of material on how various types of linear indices can 
be constructed, interpreted, optimized, and applied. The techniques described in the 
book were developed mainly with plant breeding as a focal point, an area in which 
the authors have wide experience. However, I expect that the book will be of interest 
to animal breeders as well. The linear selection index (LSI) theory developed in this 
book is based on the Smith (1936) and Hazel (1943) linear phenotypic selection 
index (LPSI) (Chap. 2), and all the LSIs described in Chaps. 3—9 are only variants of 
the LPSI. Thus, in Chap. 3, the author describes null restriction and no null 
predetermined restriction imposed over the expected genetic gain of the LPSI. In 
Chap. 4, the authors incorporated molecular marker information into the LPSI, and 
in Chap. 5 genomic estimated breeding values (GEBVs) are included in the LPSI. 
Interestingly, Chap. 6 shows how the restrictive LPSI is used in the genomic 
selection context, but this is based on the LPSI theory of Smith (1936) and Hazel 
(1943). In Chaps. 7 and 8 the only change was to assume that the economic weights 
are fixed, but unknown, and then, based on this assumption, the authors demonstrate 
the eigen selection index method (ESIM) and its variants, which are, of course, 
associated with the LPSI. In Chap. 9, the reader is shown how to combine the LPSI 
theory with the independent culling method to develop the multistage selection 
index theory. 

Chapter 10 shows results on stochastic simulations from cycles of selections 
using the linear phenotypic selection index (LPSI), the ESIM, the restrictive LPSI 
and the restrictive ESIM. In Chap. 11 the use of RindSel (R software to analyze 
Selection Indices) is presented with examples for using unrestrictive, restrictive, null 
or predetermined proportional gain indices. 

Animal and plant breeders follow somewhat different routes in the treatment of 
multiple-trait improvement by selection, mainly because the former field deals with 
candidates possessing an unequal amount of information, and extensive genetic 
inter-relatedness. Recently, however, genomic selection has reunified perspectives 
somewhat. In animal breeding, Henderson (1973) introduced the notion of “best 
prediction," and showed that the conditional expectation function E(H/DATA), 
where DATA represents all available records on all traits, unbalanced or not, was 
the "best predictor" in the sense of the mean squared error. He also showed that the 
best predictor had some additional properties that were appealing from a response to 
selection perspective. 

In a multiple-trait context and assuming multivariate normality (with known 
parameters) of the joint distribution of genetic values and DATA, the best predictor 
retrieves the selection index evaluation derived by Smith (1936) and Hazel (1943) in 
less general settings (Henderson 1963). It follows immediately that if w is known, 
the best predictor of merit is 
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E(H/DATA) = E(Wg/DATA) = WE(g/DATA) 


where E(g/DATA) is the best predictor of the breeding values. Smith (1936) and 
Hazel (1943) failed to recognize that the economic values did not need to enter into 
the selection index until after the predictions of the breeding values were obtained, 
simply because of linear invariance. Bulmer (1980) pointed out, pertinently, that it 
was unclear why ranking animals using a predictor, minimizing the mean squared 
error of prediction, would maximize expected genetic progress in a single round of 
selection, and suggested an alternative predictor that was later shown by Gianola and 
Goffinet (1982) and Fernando and Gianola (1986) to be exactly the best predictor. 
Animal breeders can perhaps interpret many of the results given in this book from 
such a perspective. 

A more difficult problem (although outside of the scope of the book) is that of 
inferring nonlinear merit. Suppose now that the merit of a candidate has the form: 


H = wg + g'Qg 


where w’ is a known row vector, as above, and Q is a known matrix, assumed to be 
symmetric without loss of generality. The conditional distribution of H given DATA 
does not have a closed form, but it can be estimated using Monte Carlo methods by 
drawing samples of g from some posterior distribution and, thus, obtaining samples 
of H from the preceding expression. If g = E(g/DATA) and C = Var(g/DATA) are 
available, the mean and variance of the conditional distribution of H can be calcu- 
lated analytically, then 


E(H/DATA) = w + g QE + (ОС) 
and, assuming multivariate normality 


Var(H/DATA) = Var(w'g) + Var(g'Qg) + 2w'Cov(g, е Ою) 
= w Cw + 2tr(QC)* + 48’QCQ¢ + 2w'CQE 


Contrary to the case of a linear merit function, the precision of the evaluation 
candidate or, equivalently, the reliability of its evaluation, enters nontrivially when 
inferring second-order merit. Gianola and Fernando (1986) suggested the Bayesian 
approach as a general inferential method for solving a large number of animal 
breeding problems, linear or nonlinear, even in situations where there is uncertainty 
about all location and dispersion parameters known. Today, the posterior distribu- 
tion of any nonlinear merit function can be arrived at via Monte Carlo sampling. 

Even when the statistical principles are well understood, it is often useful to 
understand the “architecture” of selection indices. The book is unique in presenting 
techniques needed to attain such an understanding, and represents a very valuable 
contribution to the statistical genetics of quantitative traits. It constitutes essential 
reading for plant quantitative geneticists working in multiple-trait improvement. 
However, animal breeders will also benefit from studying carefully many of its 
chapters, as these contribute knowledge in areas of animal breeding research where 
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there has been little traffic. Personally, I am sure that much benefit will be extracted 
from studying this valuable and novel contribution to the literature. 


Department of Animal Sciences, Daniel Gianola 
University of Wisconsin, Madison, WI, 
USA 


Department of Biostatistics and Medical 
Informatics, University of Wisconsin, 
Madison, WI, USA 


Department of Dairy Science, 
University of Wisconsin, Madison, WI, 
USA 
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Preface 


In the linear selection index (LSI) theory, the main distinction is between the net 
genetic merit and the LSI. The net genetic merit is a linear combination of the true 
unobservable breeding values of the traits weighted by their respective economic 
values, whereas the LSI is a linear combination of phenotypic values, marker scores 
or genomic estimated breeding values (GEBVs). The LSI can also be a linear 
combination of phenotypic values and marker scores or phenotypic values and 
GEBVs jointly. That is, the LSI is a function of observed phenotypic values, marker 
scores, or GEBVs that is used to predict the net genetic merit and select parents for 
the next generation. Thus, there are three main classes of LSI: phenotypic, marker, 
and genomic. The main advantage of the genomic LSI over the other indices lies in 
the possibility of reducing the intervals between selection cycles by more than two 
thirds. One of the main characteristics of the LSI is that it allows extra merit in one 
trait to offset slight defects in another. Thus, by its use, individuals with very high 
merit in one trait are saved for breeding, even when they are inferior in other traits 
(Hazel and Lush 1942). 

Among the LSIs developed up to now, the main distinction is between an LSI that 
uses economic weights and one that does not use economic weights to predict the net 
genetic merit. The principal LSI theory was developed assuming that the economic 
weights are fixed and known; however, recently, the LSI theory was extended to the 
case where the economic weights are fixed but unknown. This latter theory is more 
general than the first because it does not require the economic weights to be known. 
An additional distinction among the LSIs is between the single-stage LSI and the 
multistage LSI. Multistage LSIs are methods for selecting one or more individual 
traits available at different times or stages; they are applied mainly in animal and tree 
breeding where the target traits become evident at different ages. One advantage of 
the latter method over the single-stage LSI is that the breeder does not need to carry a 
large population of individuals throughout the multi-trait selection process. Some 
authors have used multistage LSI as a cost-saving strategy for improving multiple 
traits, because not all traits need to be measured at each stage. When traits have a 
developmental sequence in ontogeny, or there are large differences in the costs of 
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measuring several traits, the efficiency of multistage LSI over single-stage LSI can 
be substantial (Xu and Muir 1991, 1992). 

The LSI has two main parameters: the selection response and the expected genetic 
gain per trait or multi-trait selection response. The selection response is associated 
with the mean of the net genetic merit and is defined as the mean of the progeny of 
the selected parents or the future population mean, whereas the expected genetic gain 
per trait, or multi-trait selection response, is the population means of each trait under 
selection of the progeny of the selected parents. Thus, although the selection 
response is associated with the mean of the net genetic merit, the expected genetic 
gain per trait is associated with the mean of each trait under selection. The selection 
response and expected genetic gain enable breeders to estimate the expected pro- 
gress of the selection before carrying it out. This information gives improvement 
programs a clearer orientation and helps to predict the success of the selection 
method adopted and to choose the option that is technically most effective on a 
scientific basis (Costa et al. 2008). 

Based on the restriction imposed on the expected genetic gain per trait, the LSIs 
can be divided into unrestricted, null restricted, or predetermined proportional gains 
indices. The null restricted LSI allows restrictions equal to zero to be imposed on the 
expected genetic gain of some traits, whereas the expected genetic gain of other traits 
increases (or decreases) without imposing any restrictions. In a similar manner, the 
predetermined proportional gains LSI attempts to make some traits change their 
expected genetic gain values based on a predetermined level, whereas the rest of the 
traits remain without restrictions. All the foregoing indices have as their main 
objectives to predict the net genetic merit and select parents for the next generation. 

The LSI theory is based on multivariate normal distribution because this distribu- 
tion allows the traits under selection to be completely described using only means, 
variances, and covariances. In addition, if the traits do not correlate, they are inde- 
pendent. Linear combinations of traits are also normal; and even when the trait 
phenotypic values do not have multivariate normal distribution, this distribution serves 
as a useful approximation, especially in inferences involving sample mean vectors, 
which, in accordance with the central limit theorem, have multivariate normal distri- 
bution (Rencher 2002). By this reasoning, a fundamental assumption in the single- 
stage LSI theory is that the net genetic merit and the LSI have bivariate normal 
distribution, whereas in the multistage LSI theory, the net genetic merit and the 
LSIs have multivariate normal distribution. Under the latter assumption, the regression 
of the net genetic merit on any linear function of the phenotypic values is linear. 

The LSI theory developed in this book was based on the Smith (1936) and Hazel 
(1943) linear phenotypic selection index (LPSI) described in Chap. 2. As the reader 
shall see, all the LSIs described in Chaps. 3—9 of this book are only variants of the 
LPSI. Thus, in Chap. 3, the restricted Kempthorne and Nordskod (1959) index only 
incorporates null restriction over the LPSI expected genetic gain, and in a similar 
manner, the Mallard (1972) and Tallis (1985) index incorporates no null 
predetermined restriction over the LPSI expected genetic gain. In Chap. 4, Lande 
and Thompson (1990) and Lange and Whittaker (2001) have only incorporated into 
the LPSI molecular marker information, and in Chap. 5, the authors (Dekkers 2007; 
Togashi et al. 2011; Ceron-Rojas et al. 2015) incorporated GEBVs into the LPSI. In 
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Chap. 6, the only news is that the Kempthorne and Nordskod (1959) and the Mallard 
(1972) and Tallis (1985) indices have been used in the genomic selection context, 
but such indices are based on the LPSI theory of Smith (1936) and Hazel (1943). In 
Chaps. 7 and 8, the only change was to assume that the economic weights are fixed 
but unknown, and then, based on this assumption, we have developed the eigen 
selection index method (ESIM) and its variants, which are, of course, associated with 
the LPSI. Finally, in Chap. 9 we show that Cochran (1951) and Young (1964) 
combined the LPSI theory with the independent culling method to develop the 
multistage selection index theory, but the base theory is the Smith (1936) and 
Hazel (1943) LPSI theory. 

Note that up to now, we have used the acronym LPSI to denote the Smith (1936) 
and Hazel (1943) index, whereas the rest of the indices have been denoted by the 
name of their authors. We think that the use of this latter type of notation created 
confusion in the reader, because it gives the impression that there are many theories 
associated with the indices or that all the indices were made ad hoc. In reality, there is 
only one theory, that developed by Smith (1936) and Hazel (1943), whereas the rest 
of the indices are only variants of this theory. In this book, we intended to solve this 
problem by using a specific acronym for each index (see Table 1.1, Chap. 1 for 
details) that indicates the relationship of each index (from Chaps. 3 to 9) with 
the LPSI. For example, the null restricted Kempthorne and Nordskod (1959) 
index was denoted by RLPSI (restricted linear phenotypic selection index), whereas 
the predetermined proportional gain Mallard (1972) and Tallis (1985) index was 
denoted by PPG-LPSI (predetermined proportional gains linear phenotypic selec- 
tion index). Similar notation had been used for the molecular and genomic indices 
(see Table 1.1, Chap. 1 for additional detail). We hope that acronyms such as the 
RLPSI and PPG-LPSI help the reader to see that the latter two indices are only 
variants of the LPSI developed by Smith (1936) and Hazel (1943). To be specific, 
the RLPSI and PPG-LPSI are only projections of the LPSI to a different space. For 
example, the RLPSI projects the LPSI vector of coefficients to a smaller space than 
the original space of the LPSI vector of coefficients (see Chap. 3 for details). 

The only thing that would be strange for the reader could be the acronyms ESIM 
(eigen selection index method), RESIM (restricted eigen selection index method), 
MESIM (molecular eigen selection index method), etc., that we have used in 
Chaps. 7 and 8, and which would seem to be unrelated to the LPSI, RLPSI, etc. 
However, we would expect that the context and the theory described in the book 
indicate to the reader the relationship among all the indices described in the book. 
As we shall see in Chaps. 7 and 8, ESIM and its variants are the result of a 
application of the canonical correlation theory to the LPSI context. This is the 
keyword to understand the ESIM theory. 

The main objective of this book is to describe the LSI theory and its statistical 
properties. First, we describe the single-stage LSI theory by assuming that economic 
weights are fixed and known to predict the net genetic merit in the phenotypic 
(Chaps. 2 and 3), marker (Chap. 4), and genomic (Chaps. 5 and 6) contexts. Next, we 
describe the LSI by assuming that economic weights are fixed but unknown to 
predict the net genetic merit in the phenotypic (Chap. 7), marker, and genomic 
(Chap. 8) contexts. In Chap. 9, we describe the multistage LSI in the phenotypic, 
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marker, and genomic contexts assuming that economic weights are fixed and known. 
Chapters 10 and 11 present simulation results and SAS and R codes respectively to 
estimate the parameters and make selections using some of the LSIs described in 
Chaps. 2, 3, 4, 7, and 8. 


J. Jesus Cerón-Rojas 
José Crossa 
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Chapter 1 A) 
General Introduction Cheek fo 


Abstract We describe the main characteristics of two approaches to the linear 
selection indices theory. The first approach is called standard linear selection 
indices whereas the second of them is called eigen selection index methods. In the 
first approach, the economic weights are fixed and known, whereas in the second 
approach the economic weights are fixed but unknown. This is the main difference 
between both approaches and implies that the eigen selection index methods include 
to the standard linear selection indices because they do not require that the economic 
weights be known. Both types of indices predict the net genetic merit and maximize 
the selection response, and they give the breeder an objective criterion to select 
individuals as parents for the next selection cycle. In addition, in the prediction they 
can use phenotypic, markers, and genomic information. In both approaches, the 
indices can be unrestricted, null restricted or predetermined proportional gains and 
can be used in the context of single-stage or multistage breeding selection schemes. 
We describe the main characteristics of the two approaches to the linear selection 
indices theory and we finish this chapter describing the Lagrange multiplier method, 
which is the main tool to maximize the selection index responses. 


Linear selection indices that assume that economic weights are fixed and known 
to predict the net genetic merit are based on the linear selection index theory 
originally developed by Smith (1936), Hazel and Lush (1942), and Hazel (1943). 
They are called standard linear selection indices in this introduction. Linear 
selection indices that assume that economic weights are fixed but unknown are 
based on the linear selection index theory developed by Cerón-Rojas et al. (2008a, 
2016) and are called Eigen selection index methods. The Eigen selection index 
methods include the standard linear selection indices as a particular case because 
they do not require the economic weights to be known. To understand the Eigen 
selection index methods theory, the point is to see that this is an application of the 
canonical correlation theory to the standard linear selection index context. The 
multistage linear selection index theory will be described only in the context of the 
standard linear selection indices. As we shall see, there are three main types of 
LSI: phenotypic, marker, and genomic. Each can be unrestricted, null restricted or 
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predetermined proportional gains and can be used in the context of single-stage or 
multistage breeding selection schemes. 

For each specific selection index described in this book, we have used an 
acronym. For example, the Smith (1936), Hazel and Lush (1942), and Hazel 
(1943) index was denoted LPSI (linear phenotypic selection index), whereas the 
Cerón-Rojas et al. (2008a) index was denoted ESIM (Eigen selection index method), 
etc. For additional details, see Table 1.1 and the Preface of this book. We think that 
such notation gives the reader a more general point of view of the relationship that 
exists among all the indices described in this book. 


Table 1.1 Chapter where the index was described, authors who developed the selection index, 
acronym of the index used in this book, and description of the acronym 


Chapter | Authors who developed the index Acronym | Description 
2 Smith(1936), Hazel and Lush (1942), | LPSI* Linear phenotypic selection 
Hazel (1943) index 
Williams (1962a) BLPSI* Base linear phenotypic selec- 
tion index 
3 Kempthorne and Nordskog (1959) RLPSI* Restricted linear phenotypic 
selection index 
Mallard (1972), Harville (1975), Tallis | PPG- Predetermined proportional 
(1985), Itoh and Yamada (1987) LPSI* gain linear phenotypic selection 
index 
Pesek and Baker (1969), Yamada et al. | DG-LPSI" | Desired gains linear phenotypic 
(1975), Itoh and Yamada (1986) selection index 
4 Lande and Thompson (1990) LMSI Linear marker selection index 
Lange and Whittaker (2001) GW- Genome-wide linear marker 
LMSI selection index 
5 Togashi et al. (2011), Ceron-Rojas LGSÉ Linear genomic selection index 
et al. (2015) 
Dekkers (2007) CLGSI? Combined linear genomic 
selection index 
6 Kempthorne and Nordskog (1959), RLGSIS Restricted linear genomic 
Ceron-Rojas et al. (2015) selection index 
Tallis(1985), Ceron-Rojas et al. (2015) | PPG- Predetermined proportional 
LGSÉ gain linear genomic selection 
index 
Kempthorne and Nordskog (1959), CRLGSI! | Combined restricted linear 
Dekker (2007) genomic selection index 
Tallis (1985), Dekker (2007) PPG- Predetermined proportional 
CLGSI gain combined linear genomic 
selection index 
7 Cerón-Rojas et al. (20082) ESIM* Eigen selection index method 
Cerón-Rojas et al. (20082) RESIM* Restricted eigen selection index 
method 
Cerón-Rojas et al. (2016) PPG- Predetermined proportional 
ESIM* gain eigen selection index 
method 


(continued) 
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Table 1.1 (continued) 


Chapter | Authors who developed the index Acronym | Description 
8 Cerón-Rojas et al. (2008b) MESIM? | Molecular eigen selection 
index method 
Crossa and Cerón-Rojas (2011) GW- Genome-wide eigen selection 
ESIM? index method 
Dekkers (2007), Cerón-Rojas et al. GESIM" | Genomic eigen selection index 
(2008b) method 
Dekkers (2007), Cerón-Rojas et al. RGESIM? | Restricted genomic eigen 
(20082) selection index method 
Dekkers (2007), Cerón-Rojas et al. PPG- Predetermined proportional 
(2016) GESIM?  |gain genomic eigen selection 
index method 
9 Cochran (1951), Young (1964) MLPSI* Multistage linear phenotypic 
selection index 
Cochran (1951), Young (1964), МВІРЅІ | Multistage restricted linear 
Kempthorne and Nordskog (1959) phenotypic selection index 
Cochran (1951), Young (1964), Tallis | MPPG- Multistage predetermined pro- 
(1985) LPSI* portional gain linear pheno- 


typic selection index 
Cochran (1951), Young (1964), Ceron- | MLGSI° Multistage linear genomic 


Rojas et al. (2015) selection index 
Cochran (1951), Young (1964), MRLGSI° | Multistage restricted linear 
Kempthorne and Nordskog (1959), genomic selection index 


Ceron-Rojas et al. (2015) 


Cochran (1951), Young (1964), Tallis | MPPG- Multistage predetermined pro- 
(1985), Ceron-Rojas et al. (2015) LGSÉ portional gain linear genomic 
selection index 


“Indices that use only phenotypic information 

"Indices that use marker and phenotypic information jointly 

“Indices that use only genomic information 

"Indices that use genomic and phenotypic information jointly in the prediction of the net genetic 
merit 


1.1 Standard Linear Selection Indices 


1.1.1 Linear Phenotypic Selection Indices 


Three main linear phenotypic selection indices used to predict the net genetic merit 
and select parents for the next selection cycle are the LPSI, the null restricted LPSI 
(КІРІ), and the predetermined proportional gains LPSI (PPG-LPSI). The ІРІ is 
an unrestricted index, whereas the RLPSI and the PPG-LPSI allow restrictions to be 
imposed equal to zero and predetermined proportional gain restrictions respectively, 
on the trait expected genetic gain per trait values to make some traits change their 
mean values based on a predetermined level while the rest of the trait means remain 
without restrictions. All these indices are linear combinations of several observable 
and optimally weighted phenotypic trait values. 


4 1 General Introduction 


The simplest linear phenotypic selection index (LPSI) can be written as Ig = wy, 
where w is a known vector of economic values and y is a vector of phenotypic 
values. We called this index the base linear phenotypic selection index (BLPSI). In 
this case, the breeder does not need to estimate any parameters, and some authors 
have indicated that the BLPSI is a good predictor of the net genetic merit (H = wg, 
where g is a vector of true unobservable breeding values) when no data are available 
for estimating the phenotypic (P) and genotypic (G) covariance matrices. When the 
traits are independent and the economic weights are also known, the LPSI can be 

t 


written as / = 5 wih yi and when the economic weights are not known, the LPSI is 
i=1 


t 
1 = 5 h?y;, where w;is the ith economic weight and h? is the heritability of trait y;. 
i=l 
In Chap. 2 (Sects. 2.5.1 and 2.5.2), we will show that the foregoing three indices are 
particular cases of the more general LPSI, i.e., I = by, where b is the Z vector of 
coefficients and y is the vector of observable trait phenotypic values. In the latter 
case, we need to estimate matrices P and G. 

The LPSI was originally proposed by Smith (1936) in the plant breeding context; 
later Hazel and Lush (1942) and Hazel (1943) extended the LPSI to the context of 
animal breeding. These authors made a clear distinction between the LPSI and the net 
genetic merit. The net genetic merit was defined as a linear combination of the 
unobservable true breeding values of the traits weighted by their respective economic 
values. In the LPSI theory, the main assumptions are: the genotypic values that make 
up the net genetic merit are composed entirely of the additive effects of genes, the 
LPSI and the net genetic merit have a joint normal distribution, and the regression of 
the net genetic merit on LPSI values is linear. Two of the main parameters of this 
index are the selection response and the expected genetic gain per trait or multi-trait 
selection response. The LPSI selection response is associated with the mean of the net 
genetic merit and was defined as the mean of the progeny of the selected parents or 
the mean of the future population (Cochran 1951). The selection response enables 
breeders to estimate the expected selection progress before carrying it out. This 
information gives improvement programs a clearer orientation and helps to predict 
the success of the adopted selection method and choose the option that is technically 
most effective on a scientific basis (Costa et al. 2008). On the other hand, the LPSI 
expected genetic gain per trait, or multi-trait selection response, is the population 
mean of each trait under selection of the progeny of the selected parents. Thus, 
although the LPSI selection response is associated with the mean of the net genetic 
merit, the LPSI expected genetic gain per trait is associated with the mean of each trait 
under selection. The foregoing definition of selection response and the expected 
genetic gain per trait are valid for all selection indices described in this book. 

One of the main problems of the LPSI is that when used to select individuals as 
parents for the next selection cycle, the expected mean of the traits can increase or 
decrease in a positive or negative direction without control. This was the main reason 
why Kempthorne and Nordskog (1959) developed the basics of the restricted LPSI 
(RLPSD, which allows restrictions to be imposed equal to zero on the expected 
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genetic gain of some traits whereas the expected genetic gain of other traits increases 
(or decreases) without any restrictions being imposed. Based on the results of the 
RLPSI, Tallis (1962) and James (1968) proposed a selection index called 
predetermined proportional gains LPSI (PPG-LPSI), which attempts to make some 
traits change their expected genetic gain values based on a predetermined level, 
while the rest of the traits remain without restrictions. Mallard (1972) pointed out 
that the PPG-LPSI proposed by Tallis (1962) and James (1968) does not provide 
optimal genetic gains and was the first to propose an optimal PPG-LPSI based on a 
slight modification of the RLPSI. Other optimal PPG-LPSIs were proposed by 
Harville (1975) and Tallis (1985). Itoh and Yamada (1987) showed that the Mallard 
(1972) index is equal to the Tallis (1985) index and that, except for a proportional 
constant, the Tallis (1985) index is equal to the Harville (1975) index. Thus, in 
reality, there is only one optimal PPG-LPSI. 

In Chap. 3 (Sect. 3.1.1 and 3.2.1), we show that b, = Kb and bp = Kpb are the 
vectors of coefficients of the RLPSI and PPG-LPSI, respectively, where b is the 
LPSI vector of coefficients. Matrices K and Kp are idempotent (K = К? and Kp 
= к> ), that is, they are projectors. Matrix К projects b into a space smaller than the 
original space of b because the restrictions imposed on the expected genetic gains 
per trait are equal to zero (Sect. 3.1.1). The reduction of the space into which matrix 
K projects b will be equal to the number of null restrictions imposed by the breeder 
on the expected genetic gain per trait, or multi-trait selection response. In the PPG- 
LPSI context, matrix Kp has the same function as K (see Sect. 3.2.1 for details). 

The aims of the LPSI, RLPSI, and PPG-LPSI are to: 


. Predict the unobservable net genetic merit values of the candidates for selection. 
. Maximize the selection response and the expected genetic gain for each trait. 

3. Provide the breeder with an objective rule for evaluating and selecting several 
traits simultaneously (Baker 1974). 


The LPSI is described in Chap. 2, and the RLPSI and PPG-LPSI are described in 
Chap. 3. As we will be see in this book, the RLPSI and PPG-LPSI theories can be 
extended to all selection indices described in this book. Also, the main objectives of 
all selection indices described in this book are the same as those of the LPSI, RLPSI, 
and PPG-LPSI. 


Noe 


1.1.2 Linear Marker Selection Indices 


The linear marker selection index (LMSI) and the genome-wide LMSI (GW-LMSI) 
are employed in marker-assisted selection (MAS) and are useful in training 
populations when there is phenotypic and marker information; both are a direct 
application of the LPSI theory to the MAS context. The LMSI was originally 
proposed by Lande and Thompson (1990), and the GW-LMSI was proposed by 
Lange and Whittaker (2001). The fundamental idea of these authors is based on the 
fact that crossing two inbred lines generates linkage disequilibrium between markers 
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and quantitative trait loci (QTL), which is useful for identifying markers correlated 
with the traits of interest and estimating the correlation between each of the selected 
markers and the trait; the selection criteria are then based upon this marker informa- 
tion (Moreau et al. 2007). The LMSI combines information on markers linked to 
QTL and the phenotypic values of the traits to predict the net genetic merit of the 
candidates for selection because it is not possible to identify all QTL affecting the 
economically important traits (Li 1998). That is, unless all QTL affecting the traits of 
interest can be identified, phenotypic values should be combined with the marker 
scores to increase LMSI efficiency (Dekkers and Settar 2004). 

Moreau et al. (2000) and Whittaker (2003) found that the LMSI is more effective 
than LPSI only in early generation testing and that LMSI increased costs because of 
molecular marker evaluation. The LMSI assumes that favorable alleles are known, as 
are their average effects on phenotype (Lande and Thompson 1990; Hospital et al. 
1997). This assumption is valid for major gene traits but not for quantitative traits that 
are influenced by the environment and many QTLs with small effects interacting 
among them and with the environment. The LMSI requires regressing phenotypic 
values on marker-coded values and, with this information, constructing the marker 
score for each individual candidate for selection, and then combining the marker score 
with phenotypic information using the LMSI to obtain a final prediction of the net 
genetic merit. Several authors (Lange and Whittaker 2001; Meuwissen et al. 2001; 
Dekkers 2007; Heffner et al. 2009) have criticized the LMSI approach because it makes 
inefficient use of the available data. It would be preferable to use all the available data in 
a single step to achieve maximally accurate estimates of marker effects. In addition, 
because the LMSI is based on only a few large QTL effects, it violates the selection 
index assumptions of multivariate normality and small changes in allele frequencies. 

Lange and Whittaker (2001) proposed the genome-wide LMSI (GW-LMSTI) as a 
possible solution to LMSI problems. The GW-LMSI is a single-stage procedure that 
treats information at each individual marker as a separate trait. Thus, all marker 
information can be entered together with phenotypic information into the 
GW-LMSI, which is then used to predict the net genetic merit and select candidates. 
Both selection indices are described in Chap. 4. 


1.1.3 Linear Genomic Selection Indices 


The linear genomic selection index (LGSI) is a linear combination of genomic 
estimated breeding values (GEBVs) and was originally proposed by Togashi et al. 
(2011); however, Ceron-Rojas et al. (2015) developed the LGSI theory completely. 
The advantage of the LGSI over the other indices lies in the possibility of reducing the 
intervals between selection cycles by more than two thirds. A 4-year breeding cycle 
(including 3 years of field testing) is thus reduced to only 4 months, i.e., the time 
required to grow and cross a plant. As a result, thousands of candidates for selection 
can be evaluated without ever taking them out to the field (Lorenz et al. 2011). 
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In the LGSI, phenotypic and marker data from the training population are fitted in 
a statistical model to estimate all available marker effects; these estimates are then 
used to obtain GEBVs that are predictors of breeding values in a testing population 
for which there is only marker information. The GEBV can be obtained by multi- 
plying the genomic best linear unbiased predictor (GBLUP) of the estimated marker 
effects in the training population (Van Raden 2008) by the coded marker values 
obtained in the testing population in each selection cycle. Applying the LGSI in plant 
or animal breeding requires genotyping the candidates for selection to obtain the 
GEBV, and predicting and ranking the net genetic merit of the candidates for 
selection using the LGSI. An additional genomic selection index was given by 
Dekkers (2007); however, this index can only be used in training populations 
because GEBV and phenotypic information are jointly used to predict the net genetic 
merit. Both indices are described in Chap. 5 and in Chap. 6, we describe both indices 
in the context of the restricted selection indices. 


1.2 Eigen Selection Index Methods 


The eigen selection index methods are described in Chaps. 7 and 8. As we shall see, 
these indices are only used in training populations and can be unrestricted, restricted, 
and predetermined proportional gains selection indices; they can also use phenotypic 
and/or marker information to predict the net genetic merit. In the context of this 
linear selection index theory, it is assumed that economic weights are fixed but 
unknown. The eigen selection index methods is based on the canonical correlation 
theory and applied to the LPSI, RLSPI, etc., selection indices's context. 


1.2.1 Linear Phenotypic Eigen Selection Index Method 


Cerón-Rojas and Sahagün-Castellanos (2005) and Cerón-Rojas et al. (2006) pro- 
posed a phenotypic selection index in the principal component context that has low 
accuracy; later, Cerón-Rojas et al. (20082, 2016) developed the eigen selection index 
method (ESIM), the restricted ESIM (RESIM) and the predetermined proportional 
gain ESIM (PPG-ESIM) in the canonical correlations context (Hotelling 1935, 
1936). The ESIM is an unrestricted index, but the RESIM and PPG-ESIM allow 
null and predetermined restrictions respectively to be imposed on the expected 
genetic gains of some traits, whereas the rest remain without restrictions. The latter 
three indices use only phenotypic information to predict the individual net genetic 
merit of the candidate for selection and use the elements of the first eigenvector of 
the multi-trait heritability as the index vector of coefficients and the first eigenvalue 
of the multi-trait heritability in their selection response. The main objectives of the 
three indices are to predict the unobservable net genetic merit values of the candi- 
dates for selection, maximize the selection response and the expected genetic gain 
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per trait, and provide the breeder with an objective rule for evaluating and selecting 
several traits simultaneously. Their main characteristics are: 


1. They do not require the economic weights to be known. 

2. The first eigenvector of the multi-trait heritability is used as their vector of 
coefficients, and the first eigenvalue of the multi-trait heritability is used in the 
selection response. 

3. Owing to the properties associated with eigen analysis, it is possible to use the 
theory of similar matrices (Harville 1997) to change the direction and proportion 
of the expected genetic gain values without affecting the accuracy. 

4. The sampling statistical properties of ESIM are known. 

5. The PPG-ESIM does not require a proportional constant. 


Finally, the main theory describe in Chapter 7 was developed by Cerón-Rojas et 
al.(2008a, 2016) based on the canonical correlation framework. That is, ESIM and 
its variants (RESIM, MESIM, PPG-ESIM) are applications of the canonical corre- 
lation theory to the LPSI context. 


1.2.2 Linear Marker and Genomic Eigen Selection Index 
Methods 


Cerón-Rojas et al. (2008b) and Crossa and Cerón-Rojas (2011) extended the ESIM 
to a molecular ESIM (MESIM) and to a genome-wide ESIM (GW-ESIM), respec- 
tively, similar to the linear molecular selection index (LMSI) and to ће genome- 
wide LMSI (GW-LMSI). The MESIM and GW-ESIM have problems similar to 
those associated with the LMSI and GW-LMSI respectively (Chap. 4 for details). 
The MESIM and GW-ESIM use phenotypic information and markers linked to QTL 
to predict the net genetic merit, but the GW-ESIM omits the molecular selection step 
in the prediction. The main difference among the MESIM, the GW-ESIM, the LMSI, 
and ће GW-LMSI is how they obtain the vector of coefficients: while the LMSI and 
GW-LMSI obtain the vector of coefficients according to the LPSI theory, the 
MESIM and the GW-ESIM obtain the vector of coefficients based on canonical 
correlation analysis and the singular value decomposition theory. 

Itis possible to extend the ESIM to a genomic ESIM (GESIM), and the restricted 
RESIM and the PPG-ESIM can be extended to a restricted genomic ESIM 
(RGESIM) and to а predetermined proportional gain genomic ESIM 
(PPG-GESIM) that use phenotypic and GEBV information jointly to predict the 
net genetic merit of the candidates for selection, maximizing the selection response 
and optimizing the expected genetic gain per trait; but although the GESIM is not 
constrained, the RGESIM and the PPG-GESIM allow null and predetermined 
restrictions respectively to be imposed on the expected genetic gain to make some 
traits change their mean values based on a predetermined level, while the rest of the 
traits remain without any restriction. 


1.3 Multistage Linear Selection Indices 9 
13 Multistage Linear Selection Indices 


Multistage linear selection indices are methods of selecting one or more individual 
traits available at different times or stages and are applied mainly in animals and tree 
breeding where the traits under consideration become evident at different ages. The 
theory of these indices is based on the independent culling level method and the 
standard linear selection index theory. There are two main approaches associated 
with these indices: 


1. The optimal multistage linear selection index, which takes into consideration the 
correlation among indices at different stages when makes selection. 

2. The selection index updating or decorrelated multistage linear selection index, in 
which the correlation among indices at different stages is zero when makes 
selection. 


These indices can use phenotypic or GEBV information to predict the net genetic 
merit or combine phenotypic and GEBV in the prediction. These indices can also be 
unrestricted, null restricted or predetermined proportional gains. In this book, we 
describe only the optimal multistage linear selection index in Chap. 9 and, in this 
book, it is called simply multistage linear selection index. 

Multistage linear selection indices are a cost-saving strategy for improving mul- 
tiple traits, because not all traits need to be measured at each stage. Thus, when traits 
have a developmental sequence in ontogeny or there are large differences in the costs 
of measuring several traits, the efficiency of this index over LPSI efficiency can be 
substantial (Xu et al. 1995). Xu and Muir (1992) have indicated that the optimal 
multistage linear phenotypic selection index (MLPST) increases selection intensity on 
traits measured at an earlier age, and, with fixed facilities, a greater number of 
individuals can be selected at an earlier age. For example, if some individuals can 
be culled before final traits are measured (e.g., weaning weights in swine and beef 
cattle breeding), savings are realized in terms of feed, labor, and facilities. With the 
LPSI, the same individuals must be measured for each trait; thus, the number of traits 
measured per mature individual is the same as that for an immature individual. 

The original MLPSI was developed by Cochran (1951) in the two-stage context 
and later, Young (1964) and Cunningham (1975) combined the LPSI theory with the 
independent culling method to simultaneously select more than one trait in the 
multistage selection context. This selection method was called multistage selection 
by Cochran (1951) and Young (1964) and multistage index selection by 
Cunningham (1975). 

The MLPSI theory can also be adapted to the genomic selection context, where it 
is possible to develop an optimal multistage unrestricted, restricted, and 
predetermined proportional gains linear genomic selection index. The latter indices 
are linear combinations of estimated breeding values (GEBV) used to predict the 
individual net genetic merit and select individual traits available at different stages in 
a non-phenotyped testing population and are called multistage linear genomic 
selection indices. The advantage of these indices over the other selection indices 
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lies in the possibility of reducing the intervals between selection cycles or stages by 
more than two thirds. 

One of the main problems of all the multistage selection indices is that after the 
first selection stage their values could be non-normally distributed. In addition, for 
more than two stages, those indices require computationally sophisticated multiple 
integration techniques to derive selection intensities, and there are problems of 
convergence when the traits and the index values of successive stages are highly 
correlated. Furthermore, the computational time could be unacceptable if the number 
of selection stages becomes too high (Börner and Reinsch 2012). One possible 
solution to these problems was given by Xu and Muir (1992) in the selection 
index updating or decorrelated multistage linear phenotypic selection index context. 
However, one problem with the decorrelated multistage selection index is that its 
accuracy and selection response is generally lower than the accuracy and selection 
response of the multistage selection index described in this book. 


1.4 Stochastic Simulation of Four Linear Phenotypic 
Selection Indices 


Chapter 10 describes a stochastic simulation of four linear indices: LPSI, ESIM, 
RLPSI, and RESIM. We think that stochastic simulation can contribute to a better 
understanding of the relationship between these indices and their accuracies to 
predict the net genetic merit. 


1.5 RIndSel: Selection Indices with R 


Chapter 11 describes how RIndSel can be used to determine individual candidates as 
parents for the next cycle of improvement. RIndSel is a graphical unit interface that 
uses the selection index theory to make selection. The index can be a linear 
combination of phenotypic values, genomic estimated breeding values or a linear 
combination of phenotypic values and marker scores. 


1.6 The Lagrange Multiplier Method 


To obtain the constrained linear selection indices (e.g., RLPSI, PPG-LPSI, RESIM) 
described in Chaps. 3, 6, 7, 8, and 9, we used the method of Lagrange multipliers. 
This is a powerful method for finding extreme values (maxima or minima) of 
constrained functions. For example, the covariance between the breeding value vector 
(g) and the LPSI (I = by) is Соу(1, g) = Gb. In the LPSI context, the Gb vector can 
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take any value (positive or negative) which could be a problem for some breeding 
objectives. That is, the breeder could be interested in improving only (f — r) of t 
(r « f) traits, leaving r of them fixed; that is, the expected genetic gains of r traits will 
be equal to zero for a specific selection cycle. In such cases, we want r covariances 
between the linear combinations of g (Ug) and the / = by to be zero, i.e., Cov(I, U 
g)- ОСЬ = 0, where U is a matrix with г 178 and (t — r) O's; 1 indicates that the trait 
is restricted and О that the trait is not restricted. This is the main problem of the 
RLPSI, and the method of Lagrange multipliers is useful for solving that problem. 

In the constrained linear selection indices context, the method of Lagrange multi- 
pliers involves maximizing (or minimizing) the Lagrange function: ДН, I, g, v] = KH, 
D+ у 9(g, 1), where the elements of vector v are called Lagrange multipliers. In the 
RLPSI context, КН, D = ЕСН — 17] = w Gw + bPb — 2wGb is the mean squared 
difference between J and H. Let g(g, Г) = Cov(1, Ug) =U Gb be the covariances between 
the linear combinations of g (Ug), and [= by, the LPSI. Then, to find the RLPSI vector 
of coefficients b, = Kb, we need to minimize the Lagrange function: b Pb + w Gw —2w 
'Gb + 2v C b, with respect to vectors b and v= [ур v2 --- v, 4], where vis a vector of 
Lagrange multipliers (see Chap. 3, Sect. 3.1.1 for details). Schott (2005) has given 
additional details associated with the method of Lagrange multipliers. 
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Chapter 2 A) 
The Linear Phenotypic Selection Index dim 
Theory 


Abstract The main distinction in the linear phenotypic selection index (LPSI) 
theory is between the net genetic merit and the LPSI. The net genetic merit is a 
linear combination of the true unobservable breeding values of the traits weighted by 
their respective economic values, whereas the LPSI is a linear combination of several 
observable and optimally weighted phenotypic trait values. It is assumed that the net 
genetic merit and the LPSI have bivariate normal distribution; thus, the regression of 
the net genetic merit on the LPSI is linear. The aims of the LPSI theory are to predict 
the net genetic merit, maximize the selection response and the expected genetic gains 
per trait (or multi-trait selection response), and provide the breeder with an objective 
rule for evaluating and selecting parents for the next selection cycle based on several 
traits. The selection response is the mean of the progeny of the selected parents, 
whereas the expected genetic gain per trait, or multi-trait selection response, is the 
population means of each trait under selection of the progeny of the selected parents. 
The LPSI allows extra merit in one trait to offset slight defects in another; thus, with 
its use, individuals with very high merit in one trait are saved for breeding even when 
they are slightly inferior in other traits. This chapter describes the LPSI theory and 
practice. We illustrate the theoretical results of the LPSI using real and 
simulated data. We end this chapter with a brief description of the quadratic selection 
index and its relationship with the LPSI. 


2.1 Bases for Construction of the Linear Phenotypic 
Selection Index 


The study of quantitative traits (QTs) in plants and animals is based on the mean and 
variance of phenotypic values of QTs. Quantitative traits are phenotypic expressions 
of plant and animal characteristics that show continuous variability and are the result 
of many gene effects interacting among them and with the environment. That is, QTs 
are the result of unobservable gene effects distributed across plant or animal 
genomes that interact among themselves and with the environment to produce the 
observable characteristic plant and animal phenotypes (Mather and Jinks 1971; 
Falconer and Mackay 1996). 
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Fig. 2.1 Distribution of 252 phenotypic means of two maize (Zea mays) Е population traits: plant 
height (PHT, cm; a) and ear height (EHT, cm; b), evaluated in one environment, and of 599 - 
phenotypic means of the grain yield (GY7 and GY2, ton ha~'; c and d respectively) of one double 
haploid wheat (Triticum aestivum L.) population evaluated in two environments 


The QTs are the traits that concern plant and animal breeders the most. They are 
particularly difficult to analyze because heritable variations of QTs are masked by 
larger nonheritable variations that make it difficult to determine the genotypic values 
of individual plants or animals (Smith 1936). However, as QTs usually have normal 
distribution (Fig. 2.1), it is possible to apply normal distribution theory when 
analyzing this type of data. 

Any phenotypic value of QTs (у) can be divided into two main parts: one related 
to the genes and the interactions (g) among them (called genotype), and the other 
related to the environmental conditions (e) that affect genetic expression (called 
environment effects). Thus, the genotype is the particular assemblage of genes 
possessed by the plant or animal, whereas the environment consists of all the 
nongenetic circumstances that influence the phenotypic value of the plant or animal 
(Cochran 1951; Bulmer 1980; Falconer and Mackay 1996). In the context of only 
one environment, the phenotypic value of QTs (у) can be written as 


y=grte, (2.1) 


where g denotes the genotypic values that include all types of gene and interaction 
values, and e denotes the deviations from the mean of g values. For two or more 
environments, Eq. (2.1) can be written as y = g + e + ge, where ge denotes the 
interaction between genotype and environment. Assumptions regarding Eq. (2.1) 
are: 
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1. The expectation of e is zero, E(e) = 0. 

2. Across several environments, the expectation of y is equal to the expectation of g, 
i.e., E(g) = ug = E(y) = Wy. 

3. The covariance between g and е is equal to 0. 


The g value can be partitioned into three additional components: additive genetic 
(a) effects (or intra-locus additive allelic interaction), dominant genetic (d) effects 
(or intra-locus dominance allelic interaction), and epistasis (1) effects (or inter-loci 
allelic interaction) such that g = a + d + т. In this book, we have assumed that g = a. 

According to Kempthorne and Nordskog (1959), the following four theoretical 
conditions are necessary to construct a valid LPSI: 


1. The phenotypic value (Eq. 2.1) shall be additively made up of two parts: a 
genotypic value (g) (defined as the average of the phenotypic values possible 
across a population of environments), and an environmental contribution (е). 

2. The genotypic value g is composed entirely of the additive effects of genes and is 

thus the individual breeding value. 

. The genotypic economic value of an individual is its net genetic merit. 

4. The phenotypic values and the net genetic merit are such that the regression of the 
net genetic merit on any linear function of the phenotypic values is linear. 


чә 


Under assumptions 1 to 4, the offspring of a mating will have a genotypic value 
equal to the average of the breeding values of the parents (Kempthorne and 
Nordskog 1959). Additional conditions for practical objectives are: 


5. Selection is practiced at only one stage of the life cycle. 

6. The generations do not overlap. 

7. АП individuals below а certain level of desirability are culled without exception. 

8. Selected individuals have equal opportunity to have offspring (Hazel and Lush 
1942). 

9. The LPSI values in the ith selection cycle and the LPSI values in the (i + 1)th 
selection cycle do not correlate. 

10. The correlation between the LPSI and the net genetic merit should be at its 

maximum in each selection cycle. 


Conditions 5 to 10 indicate that the LPSI is applying in a single stage context. 


22 The Net Genetic Merit and the LPSI 


Not all the individual traits under selection are equally important from an economic 
perspective; thus, the economic value of a trait determines how important that trait is 
for selection. Economic value is defined as the increase in profit achieved by 
improving a particular trait by one unit (Tomar 1983; Cartuche et al. 2014). This 
means that for several traits, the total economic value is a linear combination of the 
breeding values of the traits weighted by their respective economic values (Smith 
1936; Hazel and Lush 1942; Hazel 1943; Kempthorne and Nordskog 1959); this is 
called the net genetic merit of one individual and can be written as 
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Н = wg, (2.2) 
where g' = [gi 822... g;] is a vector of true unobservable breeding values and 
w = [wi w2 ... w| is a vector of known and fixed economic weights. 


Equation (2.2) has several names, e.g., linear aggregate genotype (Hazel 1943), 
genotypic economic value (Kempthorne and Nordskog 1959), net genetic merit 
(Akbar et al. 1984; Cotterill and Jackson 1985), breeding objective (Mac Neil et al. 
1997), and total economic merit (Cunningham and Tauebert 2009), among others. 
In this book, we call Eq. (2.2) net genetic merit only. The values of H = w'g are 
unobservable but they can be simulated for specific studies, as is seen in the 
examples included in this chapter and in Chap. 10, where four indices have been 
simulated for many selection cycles. 

In practice, the net genetic merit of an individual is not observable; thus, to select 
an individual as parent of the next generation, it is necessary to consider its overall 
merit based on several observable traits; that is, we need to construct an LPSI of 
observable phenotypic values such that the correlation between the LPSI and 
Н = w'g is at a maximum. The LPSI should be a good predictor of Н = w'g and 
should be useful for ranking and selecting among individuals with different net 
genetic merits. The LPSI for one individual can be written as 


I — ly, (2.3) 
whereb' = [bj bz -.. b;]isthe vector of coefficients, t is the number of traits 
оп Г, and y = [у y, +> у,] is a vector of observable trait phenotypic values 


usually centered with respect to its mean. The LPSI allows extra merit in one trait to 
offset slight defects in another. With its use, individuals with very high merit in some 
traits are saved for breeding, even when they are slightly inferior in other traits 
(Hazel and Lush 1942). Only one combination of b values allows the correlation of 
the LPSI with H = w’g for a particular set of traits to be maximized. 

Figure 2.2 indicates that the regression of the net genetic merit on the LPSI is 
lineal and that the correlation between the LPSI and the net genetic merit is maximal 
in each selection cycle. Also, note that the true correlations between the LPSI and the 
net genetic merit, and the true regression coefficients of the net genetic merit over the 
LPSI are the same, but the estimated correlation values between the LPSI and the net 
genetic merit are lower than the true correlation (Fig. 2.2). Table 2.1 indicates that 
the LPSI in the ith selection cycle and the LPSI in the (i + 1)th selection cycle do not 
correlate. However, in practice, the correlation values between any pair of LPSIs 
could be different from zero in successive selection cycles. 

One fundamental assumption of the LPSI is that J = b’y has normal distribution. 
This assumption is illustrated in Fig. 2.3 for two real datasets: a maize (Zea mays) Е 
population with 252 lines and three traits—grain yield (ton ha '); plant height 
(cm) and ear height (cm)—evaluated in one environment; and a double haploid 
wheat (Triticum aestivum L.) population with 599 lines and one trait—grain yield 
(ton Ва —evaluated in three environments. Figure 2.3 indicates that, in effect, the 
LPSI values approach normal distribution when the number of lines is very large. 
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Fig. 2.2 True correlation (ТС) and estimated correlation (ECO) values between the linear pheno- 
typic selection index (LPSI) and the net genetic merit for seven selection cycles, and true regression 
coefficient (TRC) of the net genetic merit over the LPSI for four traits and 500 genotypes in one 
environment simulated for seven selection cycles 


Table 2.1 Estimated 1 2 3 4 5 6 1 
correlation values between the 
linear phenotypic selection 


index (LPSI) values in seven 
simulated selection cycles 0.256 | 0.225 1.000 | 0.198 0.276 0.267 0.213 


2.3 Fundamental Parameters of the ГРУ 


There are two fundamental parameters associated with the LPSI theory: the selection 
response (R) and the expected genetic gain per trait (E). In general terms, the 
selection response is the difference between the mean phenotypic values of the 
offspring (ио) of the selected parents and the mean of the entire parental generation 
(up) before selection, i.e., R = uo — Ир (Hazel and Lush 1942; Falconer and Mackay 
1996). The expected genetic gain per trait (or multi-trait selection response) is the 
covariance between the breeding value vector and the LPSI (7) values weighted by 


Cov(I g) 


the standard deviation of the variance of J(o7), 1.e., ec cm uM multiplied by the 
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Fig. 2.3 Maize LPSI (Fig. 2.3a) is the distribution of 252 values of the LPSI constructed with the 
phenotypic means of three maize (Zea mays) Е› population traits: grain yield (ton ha^), PHT (cm) 
and EHT (cm), evaluated in one environment. Wheat LPSI (Fig. 2.3b) is the distribution of 
599 LPSI values constructed with the phenotypic means of the grain yield (ton ha^!) of a double 
haploid wheat (Triticum aestivum L.) population evaluated in three environments 


selection intensity. This is one form of the LPSI multi-trait selection response. In the 
univariate context, the expected genetic gain per trait is the same as the selection 
response. 

One additional way of defining the selection response is based on the selection 
differential (D). The selection differential is the mean phenotypic value of the 
individuals selected as parents (из) expressed as a deviation from the population 
mean (ир) or parental generation before the selection was made (Falconer and 
Mackay 1996); that is, D = му — Ир. Thus, another way of defining К is as the 
part of the expected differential of selection (D = ws — ир) that is gained when 
selection is applied (Kempthorne and Nordskog 1959); that is 


_ Cov(g,y) 


2 
oy 


R D=koyh’, (2.4) 
where Cov(g, у) = о? is the covariance between g апа у, g is ће individual breeding 
value associated with trait y, o; is the variance of y, k= D is the standardized 


у 
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selection differential or selection intensity, and h? = 


2 
с 
< is the heritability of trait y in 


y 
the base population. Heritability (А2) appears in Eq. (2.4) as a measure of ће 
accuracy with which animals or plants having the highest genetic values can be 
chosen by selecting directly for phenotype (Hazel and Lush 1942). 

The selection response (Eq. 2.4) is the mean of the progeny of the selected parents 
or the future population mean of the trait under selection (Cochran 1951). Thus, the 
selection response enables breeders to estimate the expected progress of the selection 
before carrying it out. This information gives improvement programs a clearer 
orientation and helps to predict the success of the selection method adopted and 
choose the option that is technically most effective on a scientific base (Costa et al. 
2008). Equation (2.4) is very powerful but its application requires strong assump- 
tions. For example, Eq. (2.4) assumes that the trait of interest does not correlate with 
other traits having causal effects on fitness and, in its multivariate form the validity of 
predicted change rests on the assumption that all such correlated traits have been 
measured and incorporated into the analysis (Morrissey et al. 2010). 


2.3.1 The LPSI Selection Response 


The univariate selection response (Eq. 2.4) can also be rewritten as 
R = koyh? = kogpyy, (2.5) 


where o, was defined in Eq. (2.4) and p,, is the correlation between g and у. Thus, as 
Н = w'g and I = b/y are univariate random variables, the selection response of the 
LPSI (Ry) can be written in a similar form as Eq. (2.5), i.e., 


Ry = kronpgr. (2.6) 


where он and o; are the standard deviation and pz; the correlation between H = w'g 
and I = b’y respectively; К; = s "2 is the standardized selection differential or the 
selection intensity associated with the LPSI; ил and ив are the means of the LPSI 
values after and before selection respectively. The second part of Eq. (2.6) (kı upu) 
indicates that the genetic change due to selection is proportional to kz, он, and py; 
(Kempthorne and Nordskog 1959). Thus, the genetic gain that can be achieved by 
selecting for several traits simultaneously within a population of animals or plants is 
the product of the selection differential (kj), the standard deviation of H = w'g (он), 
and the correlation between H = w'g and Г = b'P (рн). Selection intensity ky is 
limited by the rate of reproduction of each species, whereas он is relatively beyond 
man's control; hence, the greatest opportunity for increasing selection progress is by 
ensuring that руу is as large as possible (Hazel 1943). In general, it is assumed that ky 
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and он are fixed and w known and fixed; hence, R; is maximized when py; is 
maximized only with respect to the LPSI vector of coefficients b. 
Equation (2.6) is the mean of Н = w'g, whereas бир (1 — v) is its variance and 


1— 
Р = Рш i z the correlation between H = w'g and J = b’p after selection 
HI 


was carried out (Cochran 1951), where у = АКК; — т) and т is the truncation point. 
For example, if the selection intensity is 5%, k; = 2.063, т = 1.645, and v = 0.862 
(Falconer and Mackay 1996, Table A). In R (in this case R denotes a platform for 
data analysis, see Kabakoff 2011 for details), the truncation point and selection 
intensity can be obtained as v «— qnorm(1 — q) and k «— dnorm(v)/q, respectively, 
where q is the proportion retained. Both the variance and the correlation (p7;) аге 
reduced by selection. If H = w'g could be selected directly, the gain in Н = wg 
would be Ку. Thus, the gain due to indirect selection using J = b’p is a fraction pyy of 
that due to direct selection using H = w'g. As Ку increases, R; increases (Eq. 2.6), 


V от р(1 — v) and рү decrease, and the effects are in the same direction as pj, 


increases (Cochran 1951). These results should be valid for all selection indices 
described in this book. 

Smith (1936) gave an additional method to obtain Eq. (2.6). Suppose that we have a 
large number of plant lines and we select one proportion q for further propagation. In 
addition, assume that the values of J for each line are normally distributed with variance 
o? = b/Pb; let J be transformed into a variable и, with unit variance and mean at zero, 
that is, и = Lu where jz; is the mean of J. Assume that all J values higher than / value 


are selected; then the value оЁи = и corresponding to any given value of g may be 


ascertained from a table of the standard normal probability integral (Fig. 2.4). 
Assuming that the expectations of Н and I are E(H) = 0 and Е(Г) = џи, the 
conditional expectation of Н given J сап be written as 
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Fig. 2.4 Graph of standardized LPSI values showing how a population can be separated sharply at 
a given point (и”) into a selected fraction (4), denoted by the red area, and a remainder that is culled, 
denoted by the white area 
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OHI OHI 
E(H/T) = 5 U — 9] = 5 
o; с 


1 
ance between Н and Г, and с? = b’Pbis the variance of J. Therefore, if o? and og; are 
fixed, the LPSI selection response (Кг) can be obtained as the expectation of the 
selected population, which has univariate left truncated normal distribution. A 
truncated distribution is a conditional distribution resulting when the domain of 
the parent distribution is restricted to a smaller region (Hattaway 2010). In the LPSI 
context, a truncation distribution occurs when a sample of individuals from the 
parent distribution is selected as parents for the next selection cycle, thus creating a 
new population of individuals that follow a truncated normal distribution. Thus, we 
need to find E[E(H/I)] — q 'Bo,E(u), or, using integral calculus, 


OHI : : 
ou = Boru, where В = —,, он = w Gb is the covari- 
o 


T 


E|E(H/1)\ = 


1 
ex du = F я 27 
- = "3 г) q HPHI (2.7) 


0.5 
Ed is the height of the ordinate of the normal curve at the lowest 


where z — 


value of и retained and 4 is the proportion of the population of animal or plant lines 
that is selected (Fig. 2.4). The proportion q that must be saved depends on the 
reproductive rate and longevity of the species under consideration and on whether 
the population is expanding, stationary or declining in numbers. The ordinate (z) of 
the normal curve is determined by the proportion selected (q) (Fig. 2.4). The 
amount of progress is expected to be larger as q becomes smaller; that is, as 
selection becomes more intense (Hazel and Lush 1942). Kempthorne and 
Nordskog (1959) showed that = k;. Thus, Eqs. (2.6) and (2.7) are the same, 


that is, E[E(H/D] = Ry. 


2.3.2 The Maximized Selection Response 


The main objective of the LPSI is to maximize the mean of H — w'g (Eq. 2.7). 
Assuming that P, С, w, and К; are known, to maximize R; we can either maximize 
Pu; OF minimize the mean squared difference between Г and Н, ЕСН — IY] = 
wGw + БР —  2wGb with respect to b, that is, 


БЕ|(Н — | = 2Pb — 2Gw = 0, from where 
b=P 'Gw (2.8) 
is the vector that simultaneously minimizes E[(H — 1 y] and maximizes рну, and then 


Ку = Koo ppur 
By Eq. (2.8), the maximized LPSI selection response can be written as 
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Ку = kj V b/Pb. (2.9) 


The maximized LPSI selection response predicts the mean improvement in H due 
to indirect selection on J only when b = P^! Gw (Harris 1964) and is proportional to 
the standard deviation of the LPSI variance (су) and the standardized selection 
differential or the selection intensity (К). 

The maximized LPSI selection response (Eq. 2.9) it related to the Cauchy- 
Schwarz inequality (Rao 2002; Cerón-Rojas et al. 2006), which establishes that 
for any pair of vectors u and v, if A is a positive definite matrix, then the inequality 
(wv)? < (vAv)(uA 'u) holds. Kempthorne and Nordskog (1959) proved that 

(w'Gb)? 
(w'Gw)(b'Pb) 
(w'Gb)? 


maximizing B = also maximizes Rz. According to Eqs. (2.6) and 


(2.7), R can be written as m = k , such that maximizing R is equivalent to 


1 (ЪРЬ) › 
maximizing ТЫРЫ Let Gw = и, b = v, and А = P, by the Cauchy-Schwarz 
inequality м < wGP !Gw. This implies that the maximum is reached when 


os = wGP'!Gw, at which point А; = kj v wGP^ ! Gw. This latter result is the 
same as Eq. (2.9) when b = P^ !Gw. 

Result R; = kr V w'GP Су obtained using the Cauchy-Schwarz inequality 
corroborates that b = P^! Gw (Eq. 2.8) is a global minimum when the mean squared 
difference between / and H (E[(H — I yp is minimized, and a global maximum when 
the correlation py; between J and Н is maximized because 


К; = kıvb'Pb = kj v WGP Су only when b = Р С̧у. 


2.3.3 The LPSI Expected Genetic Gain Per Trait 


coe. ») 


Whereas R— D (Eq. 2.4) denotes the selection response in the 


T" 1, 
univariate case, E = "et 8) 


D Cov(g.y) and Cov(Lg) 
Oy di 


denotes the LPSI expected genetic gain per trait. Also, 
Со») у) 


ехсерї By are mathematically equivalent and whereas 


is the covariance between g and y weighted by the standard deviation of the variance 
of y, s E is the covariance between the breeding value vector and the LPSI 
values weighted by the standard deviation of the variance of LPSI. This means that 
in effect, E is the LPSI multi-trait selection response and can be written as 


E= ч. (2.10) 
OT 
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where С, ог and Ку were defined earlier. As Eq. (2.10) is the covariance between 


t 
I—b'pandg —|g,; 8 ... 8,| divided by оу, considering g; and / = 5 Бу 
j=l 
the genetic gain in the jth index trait due to selection on / will be 


k k b'o; 
— Cov(I,g;) =— bier; + Боор +: b bj b bei =  —, (2.11) 
ОТ OT ОТ 

where o, = ln gass o^ ee ву] is a vector of genotypic covariances of the jth index 


trait with all the index traits (Lin 1978; Brascamp 1984). 

If Eq. (2.11) is multiplied by its economic weight, we obtain a measure of the 
economic value of each trait included in the net genetic merit (Cunningham and 
Tauebert 2009). In percentage terms, the economic value attributable to genetic 
change in the jth trait can be written as 


b'o; 
D 100. (2.12) 
I 


In addition, the percentage reduction in the net genetic merit of overall genetic 
gain if the jth trait is omitted from the LPSI (Cunningham and Tauebert 2009) is 


b? 
1— 1-57 100, (2.13) 
OTP; 


where фу? is the jth diagonal element of ће inverse of the phenotypic covariance 
matrix P^! and b the square of the jth coefficient of the LPSI. Equations (2.12) and 


(2.13) are measures of the importance of each trait included in the LPSI when makes 
selection. 


2.3.4 Heritability of the LPSI 


As the variance of J = by is equal too? = b'Pb = b'Gb + b’Rb, where P= С + R, 
G and R are the phenotypic, genetic, and residual covariance matrices respectively, 
then the LPSI heritability (Lin and Allaire 1977; Nordskog 1978) can be written as 


b'Gb 


When selecting a trait, the correlation between the phenotypic and genotypic 
values is equal to the square root of the trait’s heritability (ps, = A); however, in the 
LPSI context, when b — P^Gw, the maximized correlation between H and / is 


Ры = \/ эъ = a whereas h; = гор is the square root of J heritability; that is, 
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Fig. 2.5 Estimated values of the square correlation between the LPSI and the net genetic merit 
(H = w’g) and the LPSI heritability for four traits and 500 genotypes in one environment simulated 
for seven selection cycles 


from a mathematical point of view, рну 4 hr. In practice, n and pt. give similar 
results (Fig. 2.5). 


2.4 Statistical LPSI Properties 


Assuming that H and / have joint bivariate normal distribution, b = Р Су, and P, 
G and w are known, the statistical LPSI properties (Henderson 1963) are the 
following: 


1. The variance of 7 (62) and the covariance between H and Г (онл) are equal, i.e., 
о? = ош. We can demonstrate this property noting that as b = P^!Gw, 
о? = p'Pb, and он, = w'Gb, then o? = (wGP-)PP"'Gw = w'GP-!Gw, and 
ош = WGP 'Gw; i.e., o? = ону. This last result implies that when и; = 0, E 
(HII) = I. 

2. The maximized correlation between H and / is equal to py; = a That is, 


= w'Gb — wGP-!Gw = wGP-!Gw EE thüs _ a 
PHI Vw'Gwvb'Pb Vw'Gw/w'GP-'Gw V. wGw он? > PHI = с 
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3. The variance of the predicted error, Var(H — Г) = (1 — ри) о, is minimal. Note 


that Var(H — I) = E|(H — y = 0? + оу —20ни, and when b = P^'Gw, 


a. TFT NE И $353 1 min; 
o; = они, from where Var(H — I) = o5 — o; = (1— р) oq is minimal because 


by Eq. (2.8), b = Р ‘Су minimizes Var(H — I) = (1- Pin) бу. Thus, the larger 
Pur the smaller Е[(Н — I Y] and the more similar Г and Н are. If ры; > 0, I and 
Н tend to be positively related; if рну < 0, they tend to be negatively related; and if 
Pur = 0, I and Н are independent (Anderson 2003). 
4. The total variance of H explained by 11$ с? = А It is evident that if py; = 1, 
о? = 62, and if рн = 0, о? = 0. That is, the variance of Н explained Бу Г is 
proportional to оу, and when рну is close to І, с? is close to б, and if pj; is close 
to 0, o is close to 0. 


2.5 Particular Cases of the LPSI 
2.5.1 The Base LPSI 


To derive the LPSI theory, we assumed that the phenotypic (P) and the genotypic 
(G) covariance matrix, and the vector of economic values (w) are known. However, 
P, G, and w are generally unknown and it is necessary to estimate them. There are 
many methods for estimating P and G (Lynch and Walsh 1998) and w (Cotterill and 
Jackson 1985; Magnussen 1990). However, when the estimator of Р(Р) is not 
positive definite (all eigenvalues positive) or the estimator of G(G) is not positive 
semidefinite (no negative eigenvalues), the estimator of b = P'!Gw (b = P-'Gw) 
could be biased. In this case, the base linear phenotypic selection index (BLPSI): 


Ig = wy (2.15) 


may be a better predictor of H = w’g than the estimated LPSI T= by (Williams 
1962a; Lin 1978) if the vector of economic values w is indeed known. Many authors 
(Williams 1962b; Harris 1964; Hayes and Hill 1980, 1981) have investigated the 
influence of parameter estimation | errors on LPSI accuracy and concluded that those 
errors affect the accuracy of T= b'y when the accuracy of P and G is low. If vector 
w values are known, the BLPSI has certain advantages because of its simplicity and 
its freedom from parameter estimation errors (Lin 1978). Williams (19622) pointed 
out that the BLPSI is superior to] — b'yunless a large amount of data is available for 
estimating P and G. 

There are some problems associated with the BLPSI. For example, what is the 
BLPSI selection response and the BLPSI expected genetic gains per trait when no 
data are available for estimating P and G? The BLPSI is a better selection index than 
the standard LPSI only if the correlation between the BLPSI and the net genetic merit 
is higher than that between the LPSI and the net genetic merit (Hazel 1943). 
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However, if estimations of P and G are not available, how can the correlation 
between the base index and the net genetic merit be obtained? Williams (1962b) 
pointed out that the correlation between the BLPSI and Н = w'g can be written as 


[w Gw 
Рнь — wPw (2.16) 


and indicated that the ratio pj, /рнг can be used to compare LPSI efficiency versus 
BLPSI efficiency; however, in the latter case, at least the estimates of P and G, i.e., P 
and G, need to be known. 

In addition, Eq. (2.15) is only an assumption, not a result, and implies that P and 
G are the same. That is, b — Р С̧у = у only when Р = С, which indicates that the 
BLPSI is a special case of the LPSI. Thus, to obtain the selection response and the 
expected genetic gains per trait of the BLPSI, we need some information about P and 
G. Assuming that the BLPSI is indeed a particular case of the LPSI, the BLPSI 


selection response and the BLPSI expected genetic gains per trait could be written as 


Rg = Ку w'Pw, (2.17) 
and 


Gw 
E, = k, 2.18 
В ре pum 


respectively. The parameters of Eqs. (2.17) and (2.18) were defined earlier. 


There are additional implications if b — P^!Gw- w. For example, if P — G, then 
/ 

_ IG " m: 2 № Gw 
Рнь = \/ wey and BLPSI heritability Ay, = wPw 
practice, the estimated values of the o; (ни, ) are usually lower than the estimated 
values of the р/н) (Fig. 2.6). 


are equal to 1. However, in 


2.5.2 The LPSI for Independent Traits 


Suppose that the traits under selection are independent, then P and G are diagonal 
matrices and b = P^'Gw is a vector of single-trait heritabilities multiplied by the 
economic weights, because P~'G is the matrix of multi-trait heritabilities (Xu and 
Muir 1992). Based on this result, Hazel and Lush (1942) and Smith et al. (1981) used 
trait heritabilities multiplied by the economic weights (or heritabilities only) as 
coefficients of the LPSI. Thus, when the traits are independent and the economic 
weights are known, the LPSI can be constructed as 
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Fig.2.6 Values of the true correlation between the LPSI and the net genetic merit (H — w'g) (True- 
C), the estimated correlation between the LPSI and H (LPSI-C), and the estimated correlation 
between the base index and H (Base-C) for four traits and 500 genotypes in one environment 
simulated for seven selection cycles 


t 
T=) уу, (2.19) 
1 


and when the economic weights are unknown, the LPSI can be constructed as 


t 
1= So hy: (2.20) 
1 


The selection response of Eq. (2.19) and (2.20) can be seen in Hazel and Lush 
(1942). 


2.6 Criteria for Comparing LPSI Efficiency 


Assuming that the intensity of selection is the same in both indices, we can compare 
BLPSI (/5 = w'y) efficiency versus LPSI efficiency to predict the net genetic merit in 
percentage terms as 
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p = 100(4 — 1), (2.21) 


where 4 = "s fu (Williams 1962b; Bulmer 1980). Therefore, when р = 0, the 


efficiency of ‘both indices is the same; when p > 0, the efficiency of the LPSI is 
higher than the base index efficiency, and when p < 0, the base index efficiency is 
higher than LPSI efficiency (Fig. 2.6). Equation (2.21) is useful for comparing the 
efficiency of any linear selection index, as we shall see in this book. 


2.7 Estimating Matrices С and P 


To derive the LPSI theory we assumed that matrices P and G are known. In practice, 
we have to estimate them. Matrices P and G can be estimated by analysis of variance 
(ANOVA), maximum likelihood or restricted maximum likelihood (REML) (Baker 
1986; Lynch and Walsh 1998; Searle et al. 2006; Hallauer et al. 2010). Equation 
(2.1) is the simplest model because we only need to estimate two variance compo- 
nents: the genotypic variance (о) апа Ше кч aeui (02), from ca the 
phenotypic variance for trait y is the sum of 6; and o? , that is, = = 6; + o? 


However, to construct matrices P and G, we also "eed the covariance жы апу 
two traits. Thus, if y; and y; (i, j = 1, 2, ---, t) are any two traits, then the covariance 
between y; and y; (ey, ) сап be written as Oy, = Og, бе, where бу, and о„„йепоїе the 
genotypic and residua covariance sespactively of traits y; and y,. 

Several authors (Baker 1986; Lynch and Walsh 1998; Hallauer et al. 2010) have 
described ANOVA methods for estimating matrix G using specific design data, for 
example, half-sib, full-sib, etc., when the sample sizes are well balanced. In the 
ANOVA method, observed mean squares are equal to their expected values; the 
expected values are linear functions of the unknown variance components; thus the 
resulting equations are a set of simultaneous linear equations in the variance 
components. The expected values of mean squares in the ANOVA method do not 
need assumptions of normality because the variance component estimators do not 
depend on normality assumptions (Lynch and Walsh 1998; Hallauer et al. 2010). 

In cases where the sample sizes are not well balanced, Lynch and Walsh (1998) 
and Fry (2004) proposed using the REML method to estimate matrix G. The REML 
estimation method does not require a specific design or balanced data and can be 
used to estimate genetic and residual variance and covariance in any arbitrary 
pedigree of individuals. The REML method is based on projecting the data in a 
subspace free of fixed effects and maximizing the likelihood function in this 
subspace, and has the advantage of producing the same results as the ANOVA in 
balanced designs (Blasco 2001). 

In the context of the linear mixed model, Lynch and Walsh (1998) have given 


formulas for estimating variances о? апа o that can be adapted to estimate 


covariances og, and де,. Suppose that we want to estimate 6; and o? for the gth 
trait (q = 1, 2---, f = number of traits) in the absence of dominance and epistatic 
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effects using the model y, = 1и, + Zg; + ел, where the vector of averages у ММУ 
(1y,,V,) is g x 1 (g = number of genotypes in the population) and has multivariate 
normal distribution; 1 is a g x 1 vector of ones, м. is the mean of the qth trait, Z is an 
identity matrix g х а, g,~NMV(, Ao, 4) 15 a vector of true breeding values, and 
e,-NMV(0,Io; ) is a g x 1 vector of residuals, where ММУ stands for normal 
multivariate distribution. Matrix A denotes the numerical relationship matrix 
between individuals (Lynch and Walsh 1998; Mrode 2005) and У. = Ас? + lo; . 


The expectation-maximization algorithm allows the REML to be computed for 
the variance components 6; and б, by iterating the following equations: 
4 


(=) 
Os 
gin) = gin) AU Iv, (r^ AT*)y, т (Ta) | (2.22) 
eq q g 
and 
(a) 
020+) — gx) LAU. ly, (Try, 2 т(т®)], (2.23) 
q q g 
where, after n iterations, аё) апа eo) are the estimated variance components of 
q q 


6; and б, respectively; tr(.) denotes the trace of the matrices within brackets; T 
q 


SV, -V п(ту, п) ГУ! and V; is the inverse of matrix V, = Ao?, + I 
In ТХ, у. ® is the inverse of matrix ve = А02" + 152. 

The additive genetic and residual covariances between the observations of the gth 
and ith traits, y, and y; (6g, , and Сем» Ф і = 1, 2, ..., f), can be estimated using 
REML by adapting Eqs. (2.22) and (2.23). Note that the variance of the sum of y, 
and y; can be written as Var(y; + ул) = V; + V, + 2С, where V; = Ао. + Io? is the 
variance of y; and V,= Ao, at lo; is the variance of y,; in addition, 
2C, = 2Ao,i, + 216 cig = 2Соу(у, Yq) is the covariance of y, and у,, and ogig and 
Geiq are the additive and residual covariances respectively associated with the 
covariance of y, and y;. Thus, one way of estimating oi, and с, is by using the 
following equation: 


eiq 


0.5Var(y; + y,) — 0.5Var(y;) — 0.5Var(y,), (2.24) 


for which Eqs. (2.22) and (2.23) can be used. Equations (2.22) to (2.24) are used to 
estimate P and G in the illustrative examples of this book. 
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2.8 Numerical Examples 


2.8.1 Simulated Data 


This data set was simulated by Ceron-Rojas et al. (2015) and can be obtained at 
http://hdl.handle.net/11529/10199. The data were simulated for eight phenotypic 
selection cycles (CO to C7), each with four traits (Ту, T5, Т» and T4), 500 genotypes, 
and four replicates for each genotype (Fig. 2.7). The LPSI economic weights for Т}, 
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Fig. 2.7 Schematic illustration of the steps followed to generate data sets 1 and 2 for the seven 
selection cycles using the linear phenotypic selection index and the linear genomic selection index. 
Dotted lines indicate the process used to simulate the phenotypic data (according to Ceron-Rojas 
et al. 2015) 
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Тә, Тз and T4 were 1, —1, 1, and 1 respectively. Each of the four traits was affected 
by a different number of quantitative trait loci (QTLs): 300, 100, 60, and 40, respec- 
tively. The common QTLs affecting the traits generated genotypic correlations of 
—0.5, 0.4, 0.3, —0.3, —0.2, and 0.1 between Т; and T5, T, and 73, T; and Тц, T; and 
Тз, T; and T4, and Тз and Т. respectively. The genotypic value of each plant was 
generated based on its haplotypes and the QTL effects for each trait. 

Simulated data were generated using QU-GENE software (Podlich and Cooper 
1998; Wang et al. 2003). A total of 2500 molecular markers were distributed 
uniformly across 10 chromosomes, whereas 315 QTLs were randomly allocated 
over the ten chromosomes to simulate one maize (Zea mays L.) population. Each 
QTL and molecular marker was biallelic and the QTL additive values ranged from 
0 to 0.5. As QU-GENE uses recombination fraction rather than map distance to 
calculate the probability of crossover events, recombination between adjacent pairs 
of markers was set at 0.0906; for two flanking markers, the QTL was either on the 
first (recombination between the first marker and QTL was equal to 0.0) or the 
second (recombination between the first marker and QTL was equal to 0.0906) 
marker; excluding the recombination fraction between 15 random QTLs and their 
flanking markers, which was set at 0.5, i.e., complete independence (Haldane 1919), 
to simulate linkage equilibrium between 5% of the QTLs and their flanking markers. 
In addition, in every case, two adjacent QTLs were in complete linkage. For each 
trait, the phenotypic value for each of four replications of each plant was obtained 
from QU-GENE by setting the per-plot heritability of Tı, T2, Тз, and T4 at 0.4, 0.6, 
0.6, and 0.8 respectively. 


2.8.2 Estimated Matrices, LPSI, and Its Parameters 


For this example, we used only cycle СІ data and traits Т}, Т, and T3. The 
phenotypic and genotypic estimated covariance matrices for traits Ти, T2, and T3 


Д 62.50 —12.74 8.53 n 3621  —12.93 8.35 
were P= | —12.74 17.52 -3.38 | and G = | —12.93 13.04 —3.40 
8.53 —3.38 12.31 8.35 —3.40 9.96 
respectively, whereas the inverse of | matrix P was 


0.01997 0.01251  —0.01040 

P^ = | 0.01251 0.06809 0.01005 |. The estimated heritabilities for Ti, 
—0.01040 0.01005 0.09123 

T», and Т; were he = 0.579, D = 0. 744, and № = = 0.809 respectively. 

According to matrices Р! and С, and because м = = [1 -1 1], the estimated 
vector of coefficients was b’ = w'GP- '= [0.555 —1.063 1.087], from which 
the estimated LPSI can be written as 7 = 0.555T, — 1.06372 + 1.087T3. Table 2.2 
presents the first 20 genotypes, the means of the three traits (T1, T2 and T3) and the 
first 20 estimated unranked LPSI values of the 500 simulated genotypes for cycle C1. 
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Table 2.2 Number of genotypes, means of the trait (ТІ, T2 and T3) values, and unranked values of 
the LPSI for part of a simulated data set 


Means of the trait values Unranked 
Number of genotypes Tl T2 T3 LPSI values 
1 164.46 39.63 34.66 86.81 
2 144.39 50.77 34.65 63.82 
3 157.48 48.04 37.9 71.52 
4 167.30 47.98 30.49 74.97 
5 164.11 49.89 32.03 72.85 
6 166.26 40.44 29.93 81.81 
7 154.59 52.22 30.31 63.22 
8 160.00 42.91 31.23 77.12 
9 158.51 46.32 34.52 76.25 
10 163.63 45.43 35.73 81.35 
11 156.16 46.75 35.58 75.62 
12 171.38 41.17 35.13 89.52 
13 153.17 54.18 36.23 66.79 
14 149.89 52.33 31.13 61.39 
15 159.63 49.01 31.72 70.96 
16 160.70 42.51 32.99 79.85 
17 157.07 45.49 28.4 69.68 
18 167.50 41.69 36.73 88.55 
19 159.17 50.6 36.25 73.93 
20 161.80 46.58 37.33 80.84 


According to the means of the three traits, the first estimated LPSI value was 
obtained as 


Т, = 0.555(164.46) — 1.063(39.63) + 1.087(34.66) = 86.81; 


the second estimated LPSI value was obtained as 


Т, = 0.555(144.39) — 1.063(144.39) + 1.087(34.65) = 63.82, etc.; 


and the 20th estimated LPSI value was obtained as 


Toy = 0.555(161.80) — 1.063(46.58) + 1.087(37.33) = 80.84. 


This estimation procedure is valid for any number of genotypes. Table 2.3 pre- 
sents the 20 genotypes ranked by the estimated LPSI values. Note that if we use 20% 
selection intensity for Table 2.2 data, we should select genotypes 12, 18, 1, 6, and 
10, because their estimated LPSI values are higher than the remaining LPSI values 
for that set of genotypes. Using the idea described in Fig. 2.4, genotypes 12, 18, 1, 6, 
and 10 should be in the red zone, whereas the rest of the genotypes are in the white 
zone and should be culled. Here, the proportion selected is 4 = 0.2 and 
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Table 2.3 Number of genotypes, means of the trait (ТІ, T2 and T3) values and ranked values of the 
LPSI for part of a simulated data set 


Means of the trait values Ranked 
Number of genotypes Tl T2 T3 LPSI values 
12 171.38 41.17 35.13 89.52 
18 167.50 41.69 36.73 88.55 
164.46 39.63 34.66 86.81 
6 166.26 40.44 29.93 81.81 
10 163.63 45.43 35.73 81.35 
20 161.80 46.58 37.33 80.84 
16 160.70 42.51 32.99 79.85 
3 157.48 48.04 37.9 11:32 
160.00 42.91 31:23 77.12 
9 158.51 46.32 34.52 76.25 
11 156.16 46.75 35.58 75.62 
4 167.30 47.98 30.49 74.97 
19 159.17 50.6 36.25 73.93 
5 164.11 49.89 32.03 72.85 
15 159.63 49.01 31.72 70.96 
17 157.07 45.49 28.4 69.68 
13 153.17 54.18 36.23 66.79 
2 144.39 50.77 34.65 63.82 
7 154.59 5222 30.31 63.22 
14 149.89 52.33 31.13 61.39 
12 
g-— mS = 0.31, where и’ = 8135—1504 = 0.704, 81.35 is the estimated LPSI 


value or the genotype number 10, 75.64 is the mean of the 20 LPSI values, and 8.11 
is the standard deviation of the estimated LPSI values of the 20 genotypes presented 
in Tables 2.2 and 2.3. 

Table 2.4 presents 25 genotypes and the means of the three traits obtained from 
the 500 simulated genotypes for cycle C1 and ranked by the estimated LPSI values. 
In this case, we used 5% selection intensity (k; = 2.063). Also, the last four rows in 
Table 2.4 give: 


1. The means of traits Ту, T2, and Т» (175.46, 39.26, and 38.83 respectively) of the 
selected individuals and the mean of the selected LPSI values (97.84). 

2. The means of the three traits in the base population (161.88, 45.19, and 34.39) 
and the mean of the LPSI values in the base population (79.18) 

3. The selection differentials for the three traits (13.58, —5.92, and 4.44) and the 
selection differential for the LPSI (18.66) 

4. The LPSI expected genetic gain per trait (9.51, —5.48, and 4.22) and the LPSI 
selection response (19.21). 


The variance of the estimated selection index for the 500 genotypes was 
У (Г) = b’Pb = 86.72, from which the standard deviation of Г was 9.312. The 
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Table 2.4 Number of selected genotypes, selected means of the trait (T1, T2 and T3) values and 
ranked selected values of the LPSI from one simulated set of 500 genotypes with four repetitions 


Means of the trait values Ranked 
Number of genotypes Tl T2 T3 LPSI values 
353 189.68 38.16 36.13 103.97 
370 178.27 34.38 37.79 103.45 
480 174.84 42.72 45.12 100.66 
300 177.38 39.15 40.34 100.65 
273 181.18 35.94 35.14 100.52 
275 167.94 36.82 42.2 99.92 
148 173.37 37.07 39.62 99.86 
137 185.48 46.48 42.55 99.77 
351 173.79 38.38 40.52 99.68 
236 182.85 37.88 34.96 99.2 
217 175.13 38.48 39.16 98.84 
356 171.09 39.6 41.98 98.47 
167 175.39 38.73 37.73 97.17 
230 169.73 37.1 38.69 96.8 
243 171.9 41.53 41.45 96.29 
55 170.02 36.92 37.76 96.15 
68 172.56 37.18 36.7 96.13 
36 175.8 38.86 36.34 95.75 
164 173.61 38.37 36.42 95.14 
140 170.53 42.52 41.97 95.05 
146 177.4 39.64 35.5 94.89 
432 174.01 40.73 38.26 94.84 
378 176.62 42.69 38.47 94.44 
288 172.14 39.31 37.26 94.23 
386 175.77 42.89 38.81 94.13 
Mean of selected individuals 175.46 39.26 38.83 97.84 
Mean of all individuals 161.88 45.19 34.39 79.18 
Selection differential 13.58 —5.92 4.44 18.66 
Expected genetic gain for 5% 9.51 —5.48 4.22 19.21 


The selection intensity was 5% 


estimated standardized selection differentials for the LPSI can be obtained from 
Table A in Falconer and Mackay (1996), where, for 5% selection intensity, 
К = 2.063. This means that the estimated LPSI selection response was 
R = 2.063(9.312) = 19.21, whereas the expected genetic gain per trait, or multi- 
b'G 


trait selection response, was E = 2.063 = [9.51 —5.48 422]. 
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2.8.3 ГРУ Efficiency Versus Base Index Efficiency 


The estimated correlation between the LPSI and the net genetic merit was 

as o А : а 

Ёш = a = 0.894, whereas the estimated correlation between the base index and 
OH 


Aa 


the net genetic merit was рии, = 0.875, thus A= НЕ = 1.0217 and, by Eq. (2.21), 


PHI, 
p= 100(2 — 1) = 2.171. This means that LPSI efficiency was only 2.2% higher 
than the base index efficiency for this data set. 

Using the same data set described in Sect. 2.8.1 of this chapter, we conducted 
seven selection cycles (СІ to C7) for the four traits (Tj, T5, T3, and T4) using the 
LPSI and the BLPSI. These results are presented in Table 2.5. To compare the LPSI 
efficiency versus BLPSI efficiency, we obtained the true selection response of the 
simulated data (second column in Table 2.5) and we estimated the LPSI and BLPSI 
selection response for each selection cycle (third column in Table 2.5); in addition, 
we estimated the LPSI and BLPSI expected genetic gain per trait for each selection 
cycle (columns 4 to 7 in Table 2.5). The first part of Table 2.5 shows the true 
selection response and the estimated values of the LPSI selection response and 
expected genetic gain per trait. In a similar manner, the second part of Table 2.5 
shows the true selection response, the estimated values of the BLPSI selection 


Table 2.5 The LPSI and BLPSI responses (true and estimated) and estimated expected genetic 
gain per trait for seven simulated selection cycles 


Selection response Estimated expected genetic gain per trait 
Cycle True Estimated Tl T2 T3 T4 
LPSI 
1 17.84 17.81 7.90 —4.67 3.33 1.92 
2 15.66 15.69 7.06 —3.59 3.17 1.86 
3 14.44 14.22 6.67 —3.21 2.82 1.52 
4 14.29 14.34 7.53 —3.45 2.07 1.29 
5 13.86 13.64 7.14 —2.66 2.51 1.33 
6 12.47 12.04 6.23 —2.62 1.98 1.21 
7 12.44 11.61 5.38 =2.55 2.47 1.22 
Average 14.43 14.19 6.85 —3.25 2.62 1.48 
BLPSI 
1 17.84 22.15 8.38 —4.40 3.04 1.64 
2 15.66 20.49 7.74 —3.33 2.82 1.53 
3 14.44 19.33 7.29 —3.00 2.44 1.22 
4 14.29 19.49 8.05 —3.17 1.89 1.05 
5 13.86 18.93 7.64 —2.53 2.19 1.07 
6 12.47 17.72 6.81 —2.40 1:72 0.93 
7 12.44 17.28 5.89 —2.35 2.11 0.93 
Average 14.43 19.34 7.40 —3.02 2:32 1.19 


The selection intensity was 10% (k; = 1.755) 
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response, and the expected genetic gain per trait. The average value of the true 
selection response was equal to 14.43, whereas the average values of the estimated 
LPSI and BLPSI selection response were 14.19 and 19.34 respectively. Note that 
14.43-14.19 = 0.24, but 19.34-14.43 = 4.91. According to this result, the BLPSI 
over-estimated the true selection response of the simulated data by 34.7%. Thus, 
based on the Table 2.5 results and those presented in Fig. 2.6, we can conclude that 
the LPSI was more efficient than the BLPSI for this data set. 

Finally, additional results can be seen in Chap. 10, where the LPSI was simulated 
for many selection cycles. Chapter 11 describes RIndSel: a program that uses В and 
the selection index theory to make selection. 


2.9 The LPSI and Its Relationship with the Quadratic 
Phenotypic Selection Index 


In the nonlinear selection index theory, the net genetic merit and the index are both 
nonlinear. There are many types of nonlinear indices; Goddard (1983) and Weller et 
al. (1996) have reviewed the general theory of nonlinear selection indices. In this 
chapter, we describe only the simplest of them: the quadratic index developed 
mainly by Wilton et al. (1968), Wilton (1968), and Wilton and Van Vleck (1969), 
which is related to the LPSI. 


2.9.1 The Quadratic Nonlinear Net Genetic Merit 


The most common form of writing the quadratic net genetic merit is 


H; = a w(u + 8) + (и + в) А(и +8), (2.25) 


where a is a constant, g is the vector of breeding values, which has normal 
distribution with zero mean and covariance matrix G, p is the vector of population 
means, and w is a vector of economic weights. In addition, matrix A can be written 


wi 0.5%12 = 0.5 
0.5 ... 0.5 А | ; 
asA = ae i Е = , where the diagonal ith values w; (i = 1,2, 
0.5иџ; 0.5», ... w; 
...,f) is the relative economic weight of the genetic value of the squared trait i and 
w; (ij = 1,2, ..., t) is the economic weight of the cross products between the genetic 


values of traits i and j. The main difference between the linear net genetic merit (Eq. 
2.2) and the net quadratic merit (Eq. 2.25) is that the latter depends оп p and (p + g) 
А(р + 2). 
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2.9.2 The Quadratic Index 


The quadratic phenotypic selection index is 


I, = В -Ъу + y By (2.26) 
where р is a constant, y is the vector of phenotypic values that has multivariate normal 
distribution with zero mean and covariance matrix P, b' — [b] by +++ bl isa 

bi 0.5612 +--+: Ob; 
vector of coefficients, and B — 26 b2 = uc . In matrix B, the 
0.5, 0.525, ... b, 
diagonal ith values b; (i = 1,2, ..., t ) is the index weight for the square of the 


phenotypic i and bj; (i,j = 1,2, . .., t) is the index weight for the cross products between 
the phenotype of the traits i and j. 


2.9.3 The Vector and the Matrix of Coefficients 
of the Quadratic Index 


As we saw in Sect. 2.3.2 of this chapter, to obtain the vector (b) and the matrix (B) of 
coefficients of the quadratic index that maximized the selection response, we can 
minimize the expectation of the square difference between the quadratic index (14) 
and the quadratic net genetic merit (H,): Ф = Е{[1 — EU,)] — [Hy — e СА ме 
Cow Hg Tq 


Е у (ь) v/var(n,) 


where Cov(H,, I4) is the covariance between J, and H,, Var (Iq) is the standard 


can maximize the correlation between Г, and H,, i.e., р 
4 4 Hl, 


deviation of the variance of J,, and Var(H,) is the standard deviation of the 


variance of H,. In this context, it is easier to maximize PH, than to minimize Ф. 
Vandepitte (1972) minimized Ф, but in this section we shall maximize рн, Ly 

Suppose that и = 0, since a and р are constants that do not affect Pu, We can 
write J, and Ну as I, = b'y + y By and Н, = wg + g Ag. Thus, under the assumption 
that y and g have multivariate normal distribution with mean 0 and covariance matrix 
P and С, respectively, Е(Т„) = tr(BP) and ЕН.) = (АС) are the expectations of J, 
and H,, whereas Var(/,) = b'Pb + 2tr((BP)^] and Var(H,) = w'Gw + 2tr[((AG)'] are 
the variances of J, and H,, respectively. The covariance between I, and Н, is Cov 
(H,,1;) = WGb + 2tr(BGAG) (Vandepitte 1972), where tr(o) denotes the trace 
function of matrices. 

According to the foregoing results, we can maximize the natural logarithm of 
PH, [In (они) with respect to vector b and matrix В assuming that w,A,P, and С 


are known. Hence, except for two proportional constants that do not affect the 
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maximum value of py ;, because this is invariant to the scale change, the results of 


the derivatives of In (vn, n) with respect to b and B are 


b = P` 'Gw and B = P'GAGP™', (2.27) 


respectively. In this case, b = P^ !Gw is the same as the LPSI vector of coefficients 
(see Eq. 2.8 for details); however, when p Z 0, b = P^ !G(w + 2Ap) = P Су «2P ^! 
GAp. In the latter case, b has the additional term 2P" 'GAp, which is null when и = 0 
or A = 0. Hence, when и Z 0 the quadratic index vector b shall have two 
components: P^!Gw, which is the LPSI vector of coefficients, and 2P !GAp, 
which is a function of the current population mean p multiplied by matrix A. 
Therefore, when p Æ 0 and A Z 0, the quadratic index vector b will change when 
the p values change. However, p does not affect matrix B. 


2.9.4 The Accuracy and Maximized Selection Response of the 
Quadratic Index 


According to Eq. (2.27) results, Var(I,) = Cov(H,,1,) = b'Pb + 2tr((BP)^], which 
means that the quadratic index accuracy and the maximized selection response can 
be written as: 


wGP-'Gw + 2rr| (P7 GAG)'| 


PH, = (2.28) 


ve + 2tr CO 


and 


R, = | wGP-'Gw + 2ir|(P“'GAG)’ |, (2.29) 


respectively, where k is the selection intensity of the quadratic index. Equations 
(2.27) to (2.29) indicate that the LPSI and the quadratic index are related, and the 
only difference between them is the quadratic terms. Wilton et al. (1968) wrote Eq. 


(2.29) as: К, = КУЪРЬ + К [24 | 
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Chapter 3 
Constrained Linear Phenotypic Selection — 5 
Indices 


Abstract The linear phenotypic selection index (LPSI), the null restricted LPSI 
(RLPSI), and the predetermined proportional gains LPSI (PPG-LPSI) are the main 
phenotypic selection indices used to predict the net genetic merit and select parents 
for the next selection cycle. The LPSI is an unrestricted index, whereas the RLPSI 
and the PPG-LPSI allow restrictions equal to zero and predetermined proportional 
gain restrictions respectively to be imposed on the expected genetic gain values of 
the trait to make some traits change their mean values based on a predetermined level 
while the rest of the trait means remain without restrictions. One additional restricted 
index is the desired gains LPSI (DG-LPSI), which does not require economic 
weights and, in a similar manner to the PPG-LPSI, allows restrictions to be imposed 
on the expected genetic gain values of the trait to make some traits change their mean 
values based on a predetermined level. The aims of RLPSI and PPG-LPSI are to 
maximize the selection response, the expected genetic gains per trait, and provide the 
breeder with an objective rule for evaluating and selecting parents for the next 
selection cycle based on several traits. This chapter describes the theory and practice 
of the RLPSI, PPG-LPSI, and DG-LPSI. We show that the PPG-LPSI is the most 
general index and includes the LPSI and the RLPSI as particular cases. Finally, we 
describe the DG-LPSI as a modification of the PPG-LPSI. We illustrate the theoret- 
ical results of all the indices using real and simulated data. 


3.1 The Null Restricted Linear Phenotypic Selection Index 


Conditions to construct a valid null restricted linear phenotypic selection index 
(RLPSI) are the same as those described in Sect. 2.1 of Chap. 2. The main objective 
of the RLPSI is to optimize, under some null restrictions, the selection response, to 
predict the net genetic merit H — w'g and select the individuals with the highest net 
genetic merit values as parents of the next generation. The RLPSI allows restrictions 
equal to zero to be imposed on the expected genetic gains of some traits, whereas 
other traits increase (or decrease) their expected genetic gains without imposing any 
restrictions. The RLPSI solves the LPSI equations subject to the condition that the 
covariance between the index and some linear functions of the genotypes involved 
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be zero, thus preventing selection on the RLPSI from causing any genetic change in 
some expected genetic gains of the traits (Cunningham et al. 1970). 

Vector b = P^! Gw maximizes the LPSI selection response, expected genetic 
gains per trait, and the correlation between the LPSI and H = wg. In this section, we 
show that the vector of the RLPSI coefficients, be = Kb: 


1. Maximizes the RLPSI selection response. 

2. Impose null restrictions on the RLPSI expected genetic gains per trait (or multi- 
trait selection response). 

. Maximizes the correlation with the true net genetic merit. 

4. Minimizes the mean prediction error. 


чә 


Vector bg = Kb is a linear transformation of the LPSI vector of coefficients (b) 
made by the projector matrix K. Matrix K is idempotent (K = К?) and projects 
b into a space smaller than the original space of b because the restrictions imposed 
on the expected genetic gains per trait are equal to zero. The reduction of the space 
into which matrix K projects b is equal to the number of null restrictions imposed by 
the breeder on the expected genetic gain per trait, or multi-trait selection response 
(Cerón-Rojas et al. 2016). 

The covariance between the breeding value vector (g) and the LPSI (/ — by) is 
Соу(1, g) = Gb. Suppose that the breeder is interested in improving only (t — r) of 
t (r « t) traits, leaving г of them fixed, that is, r expected genetic gains of the trait are 
equal to zero for a specific selection cycle. Thus, we want r covariances between the 
linear combinations of g (Ug) and the 7 — by to be zero, i.e., Соу(1, Ug) =U 
Gb = 0, where U is a matrix with r 1’s and (t — г) 0°; 1 indicates that the trait is 
restricted and 0 that the trait is not restricted. That is, in the linear combinations of 
5 (Ug), 1 is the coefficient of the genotypes that have covariance equal to zero with 
the LPSI, whereas the genotypes with coefficient 0 have no restriction on the 
expected genetic gains. We can solve this problem by maximizing the correlation 
between J and Н (рн) or minimizing the mean squared difference between J and H(E 
[Н — I?]) under the restriction U Gb = 0. 


3.1.1 The Maximized RLPSI Parameters 


In the LPSI context, vector b = P^!Gw minimizes the mean squared difference 
between Гапа H, E[(H — I] = w Gw + b Pb — 2w Gb. Let С = UG and СЪ = 0; 
we need to minimize Е[(Н — I y with respect to b under the restriction Cb = 0. 
Thus, assuming that P, G, U and w are known, we need to minimize the function 


V(b, v) = b'Pb + w'Gw — 2w Gb + 2v C'b (3.1) 


with respect to vectors b and v= [vi v2 ++ У, 1], where у is a vector of Lagrange 
multipliers. The derivative results from b and v are 
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Pb + Cv = Gw 
and 
СЪ = 0, 
or, in matrix notation, 
P C|[b Gw о с |у 
fe v][] "Lo ] = [e oll 


In the latter case of Eq. (3.2), the solution is 


KELE I " 


/ 


/ 
С | and Бр is the RLPSI vector of 


| 


FA (3.2) 


C P C P 


coefficients. There is a mathematical algorithm (Searle 1966; Schott 2005) for 
/ 


Р 


-1 
where | | is the inverse of matrix | 


-1 
finding matrix E | . It can be shown that 


E Ta ere)" (C'P-!C) !c'p-! Gu 


СР P-C(CP-C)  -P-C(CP-C) CP- +P" 


whence the RLPSI vector of coefficients (bg) that minimizes E[(H — Г)?] апа 
maximizes рну under the restriction C b = 0 can be written as 


be = Kb, (3.5) 


where К = [I — Q], О = PCCP OC and b = P~ Су; P^! is the inverse of 
matrix P and I is an identity matrix t x t. When there are no restrictions on any traits, 
U is a null matrix and bre =b= P "См, the LPSI vector of coefficients. Thus, the 
RLPSI includes the LPSI as a particular case. 

According to Eq. (3.5), the RLPSI can be written as 


Ig = bry, (3.6) 
whereas the maximized correlation between the RLPSI and the net genetic merit is 


_ 'С®к 
Pile A Gw b Por. 
According to conditions for constructing a valid RLPSI, the index Ip = bay 


should have normal distributions. Using 1 and 2 null restrictions, this assumption is 
illustrated in Fig. 3.1 for a real maize (Zea mays) F> population with 247 lines and 


(3.7) 
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Fig. 3.1 (а) and (b) show the distributions of 247 values of the restricted linear phenotypic 
selection index (RLPSI), with one and two restrictions respectively, constructed with the pheno- 
typic means of four maize (Zea mays) F> population traits: grain yield (ton ha !), plant height (cm), 
ear height (cm), and anthesis day (days), evaluated in one environment 


four traits—grain yield (ton һа '); plant height (cm), ear height (cm), and anthesis 
day (days)—evaluated in one environment. Figure 3.1 indicates that, in effect, the 
RLPSI values approach normal distribution. 

Under the null restrictions made by the breeder, IR = bry should have maximum 
correlation with H = wg and should be useful for ranking and selecting among 
individuals with different net genetic merit; however, рн, is lower than the 
correlation between LPSI and H — wg (онг) in each selection cycle because when 
the restriction C b = 0 is imposed on the RLPSI vector of coefficients, the restricted 
traits do not affect the correlation руу, . Using simulated data described in Sect. 2.8.1 
of Chap. 2, we estimated pyr, and py; for seven selection cycles and compared the 
results in Fig. 3.2. Correlation рр, values were estimated for one, two, and three null 
restrictions and in effect, they were lower than the estimated values of рр; in all 
selection cycles (Fig. 3.2). Additional results can be seen in Chap. 10, where the 
RLPSI was simulated for many selection cycles. Chapter 11 describes RIndSel: a 
program that uses R (in this case R denotes a platform for data analysis, see Kabakoff 
2011 for details) and the selection index theory to select individual candidates for 
selection. 
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Fig. 3.2 Estimated correlation values between the linear phenotypic selection index (LPSI) and the 
net genetic merit (Н = wg); estimated correlation values between the RLPSI and Н for one (red), 
two (yellow), and three (green) restrictions for four traits and 500 genotypes in one environment 
simulated for seven selection cycles 


The maximized RLPSI selection response and the restricted expected genetic gain 


per trait can be written as 
Кр = k1y/bpPbp (3.8) 


and 


Gb; 
V/b, Pb; ' 


respectively, where Ку is the standardized selection differential or selection intensity 
associated with the RLPSI. 

The maximized RLPSI selection response has the same form as the maximized 
LPSI selection response; thus, under r restrictions, Eq. (3.8) predicts the mean 
improvement in H owing to indirect selection on Ip = bay when be = Kb. The 
restriction effects are observed on the RLPSI expected genetic gains per trait 
(Eq. 3.9) where each restricted trait has an expected genetic gain equal to zero. In 
addition, because the RLPSI selection response and expected genetic gain per trait 
values are also affected by the restricted traits, they are lower than the LPSI selection 
response and expected genetic gain per trait values. 


Er = kı (3.9) 
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3.1.2 Statistical Properties of the RLPSI 


Under the assumptions that H — wg and Ip = Бру have a bivariate joint normal 
distribution, b; = Kb, = P^!Gw, and P, С, and w are known, the RLPSI has the 
following properties: 


1. Matrices Q = P ICCP С) С and K = [I — Q] are projectors. That is, Q and 
K are idempotent (Q = О? and К = К?) and orthogonal (КО = ОК = 0). It can 
be shown that О = Q^, К = K?, and КО = ОК = 0 noting that 
О = Р С(СР-'С) !СР!С(СР O C = Р'С(СР !С) = О, К? 
=П—ОП—-—О]=1—20+0°=1-—-О0=К,апа4КО=0К=0—-0°=0. 

2. Matrix Q projects vector b into a space generated by the columns of matrix С owing 
to the restriction Cb = 0 used when V(b, v) is maximized with respect to b and v. 

3. Matrix K projects b into a space perpendicular to the space generated by the 
C matrix columns (Rao 2002). 

4. Because of the restriction C b — 0, matrix K projects b into a space smaller than 
the original space of b. The space reduction into which matrix K projects b is 
equal to the number of zeros that appears in Eq. (3.9). 

5. Vector b, — Kb minimizes the mean square error under the restriction Cb - 0. 


6. The variance of Ig = bry (от, = ЪЬРЬ) is equal to the covariance between 
Ip = bhy and Н = wg (сн, = W'Gbp). First note that К = K?, КР = PK, and 
bP = wG; ћепо? = b; Pb, = b'K'PKb = b'PK’b = b'PKb = w'Gb; = он. 

7. The maximized correlation between Н and / is equal to ppr, = Е In point 6 of 
this subsection we showed that они, = "m then 

_ w'Gbr _ РЬ» or, 
Рив УС РБЕ wGw og 


8. The variance of the predicted error, Var(H — Ig) — (1 — Ри.) is minimal. 


By point 6 они, = ar whence Var(H — Ig) = o7, — б], = (1 — Pu, | 


b Gb 
9. RLPSI heritability is equal to hj. = Ё E 
b, Pb; 


Points 1—4 show that in effect, the RLPSI projects the LPSI vector of coefficients 
into a space smaller than the original LPSI vector of coefficients. In addition, the 
RLPSI statistical properties denoted by points 5-9 are the same as the LPSI 
statistical properties. Thus, the RLPSI is a variant of the LPSI. 


3.1.3 The RLPSI Matrix of Restrictions 


The main difference between the RLPSI and the LPSI is the restriction U Gb — 0 
used to obtain the RLPSI vector of coefficients. This restriction is introduced through 
matrix U (t — 1) x t, which is called matrix of null restrictions and is very important 
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in an RLPSI context. The form and size of matrix U depends on the number of 
restricted traits. For example, suppose that we restrict only one of f traits; then we 


can restrict the first of them as U'—[1 0 0 ... 0], the second as 

U'=[0 1 O -- 0] the third as О = [0 0 1 --- 0, etc. When we 

restrict two of Е traits, matrix U could be constructed as follows. We can restrict 

the first and second traits as U' — | | | | | | | d , the first and third traits as 
; |1 0 0... 0 | | 

U = | 00 1 0" the second and third traits as 
j 010.0 . : "P 

U = 00 1 0" etc. If we restrict three of f traits, matrix U will have 


the following form when the first, second, and third traits are restricted, 


1000 0 
U=10 10 0 0 | ; if the first, second, and fourth traits are restricted, 

0.0 1 0 0 

1000 0 

U = 1 0 0 0 

00 1 0 


| , etc. The procedure to construct matrix 


| if the second, the third and the fourth traits are 
0 
0 
1 0 


0 1 0 
restricted, U = |0 0 1 
0 0 0 


U is valid for any number of restricted traits. 


t 
There are A 2' (Leon-Garcia 2008) possible forms for constructin 
F р 5 


matrix U, where = At mandi! = t(t— 1)(t — 2)(t — 3)-- -(t — (t — 1)). Note, 


however, that when r = 0, U is a null matrix, and when г = f, all traits are restricted 
and then the RLPSI values are null. Thus, the breeder should be interested only in 2° 
— 2 possible ways of constructing matrix U. 


3.1.4 Numerical Examples 


To illustrate the RLPSI theoretical results, we use the data set described in Sect. 2.8.1 
of Chap. 2. We used that data set for seven phenotypic selection cycles (C1 to C7), 
each with four traits (Ту, Т, Тз and T4), 500 genotypes and four replicates for each 
genotype. The economic weights for Ti, T», Тз, and T4 were 1, —1, 1, and 1 respec- 
tively. The estimated phenotypic (P) and genetic (G) covariance matrices for traits T}, 
T», T3, and T4 obtained for the first selection cycle (C1) of the simulated data were 
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62.50 —12.74 8.53 2.73 
—12.74 17.52 -3.38 —2.28 


P=) 353 -338 1231 016 M 
2.73 —228 0.16 7.27 
3621 —12.93 8.35 2.74 
&_|-12.93 1304 —3.40 -224 


8.35 —3.40 9.96 0.16 |? 
2.74 —2.24 0.16 6.64 


respectively. We can restrict T, with matrix U; =[1 0 0 0]; T; and T, with 


1000 
matrix 0 = lo | | o] „апа Ti, T; and Тз with matrix U; = |O 1 0 0 
0.0 1 0 


n 


Matrix С’ = ОС associated with О, U, and 0, can be obtained as 
С =U,G = [36.21 —12.93 835 2.74], 


/ 325 _ | 36.21 —12.93 8.35 2.74 
Уе ET 1304 -340 -224p and 
" 36.21  —12.93 8.35 2.74 
С, = U,G = | 12.93 13.04 —3.04 —2.24 
8.35 —3.40 9.96 0.16 
The estimated LPSI vector of coefficients was 


Б =wGP! = [0.55 —1.05 1.09 1.06]. 
The estimated matrices О = P-'C(CP-!C) Сап = L- Q] (where I, is 
an identity matrix 4 x 4) for 1 null restriction, were 


02 -—026 017 0.05 
м mc yy eel ost Was cl. 064 
Q;-P'C(CP C) = | озо оа 00 ооз | 9M 
014 —0.05 0.03 0.01 


028 026 —0.17 —0.05 
" А 0.51 0.82 012 0.04 
К: = 0-01] = | 030 014 091 —0.03 
—0.14 0.05 —0.03 0.99 


Thus, the estimated RLPSI vector of coefficients was 
^ “~ nI 
b, = (Кр) —[-0.35 —0.41 0.59 0.89], whence the estimated RLPSI for 


1 null restriction can be written as Tr, = —0.35T — 0.41 T: + 0.59Т» + 0.8974. 
The average values of Тү, Т», Тз, and T4 were 164.46, 39.63, 34.66, and 23.11 
(Table 3.1) respectively; then, 
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Table 3.1 Ten genotypes, mean values of four traits, and unranked and ranked values of the 
restricted linear phenotypic selection index (RLPSI) obtained from 500 simulated genotypes (each 
with four repetitions) and four traits (T1, T2, T3, and T4) in one environment for one selection cycle 


Means of the trait values 
Number of genotypes Tl T2 T3 T4 RLPSI values 
1 164.46 39.63 34.66 23.11 —33.24 (unranked) 
2 144.39 50.77 34.65 19.56 —33.94 (unranked) 
3 157.48 48.04 37.9 19.03 —35.96 (unranked) 
4 167.3 47.98 30.49 24.75 —38.73 (unranked) 
5 164.11 49.89 32.03 25.32 —36.98 (unranked) 
6 166.26 40.44 29.93 20.55 —39.29 (unranked) 
7 154.59 52.22 30.31 18.86 —41.33 (unranked) 
8 160 42.91 31.23 20.95 —36.98 (unranked) 
9 158.51 46.32 34.52 18.36 —38.2 (unranked) 
10 163.63 45.43 35.73 19.57 —37.85 (unranked) 
1 164.46 39.63 34.66 23.11 —33.24 (ranked) 
2 144.39 50.77 34.65 19.56 —33.94 (ranked) 
3 157.48 48.04 37.9 19.03 —35.96 (ranked) 
5 164.11 49.89 32.03 25.32 —36.98 (ranked) 
8 160 42.91 31.23 20.95 —36.98 (ranked) 
10 163.63 45.43 35.73 19.57 —37.85 (ranked) 
9 158.51 46.32 34.52 18.36 —38.2 (ranked) 
4 167.3 47.98 30.49 24.75 —38.73 (ranked) 
6 166.26 40.44 29.93 20.55 —39.29 (ranked) 
7 154.59 52.22 30.31 18.86 —41.33 (ranked) 


Tr, = —0.35(164.46) — 0.41(39.63) + 0.59(34.66) + 0.89(23.11) = —33.24. 


In Table 3.1 we present ten genotypes, the mean values of four traits, and the 
unranked and ranked values of the RLPSI from 500 genotypes in one environment 
simulated for one selection cycle. The first part of Table 3.1 presents the ten 
unranked genotypes, whereas the second part presents the ten genotypes ranked by 
the estimated RLPSI values. 

Assuming a selection intensity of 10% (k; = 1.755), the estimated selection 


response and the estimated expected genetic gain per trait for | null restriction 
a, А 


д PU д) b, G 
were Ёк, = 1.7554 bR, Pbr, = 6.87 and Ер = 1.755 —— =[0 —22 
bs, Pbr, 
2.03 2.66], respectively, and the estimated correlation between the RLPSI and the 


re |b РБ 
net genetic merit was ры, = еси = 0.35. 
w'Gw 


In a similar manner to that for 1 null restriction, it is possible to obtain the 
ES E ^ —1 A A 
estimated matrices Q = Р-'С(СР-'С) Саак = [L — О], and the estimated 
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RLPSI vector of coefficients for 2 and 3 null restrictions. Thus, for 2 and 3 null 
restrictions, the estimated selection responses were Rr, = 1.7554/ bi, Pha, = 5.54 


and Rp, = 1.755 \/ bh, РБ, = 4.12 respectively, whereas the estimated 


^ 
/ 


expected genetic gains рег trait were E, = 19359 eee 
b'r Pbr, 
2, БС 
[0 0 2.773 2.768] and E, = 1.755——+——=[0 0 0 4.12]. 


78271 

Note that the estimated RLPS selection response decreased when the number of 
restrictions increased. Also, the number of zeros in the expected genetic gain per trait 
increased from 1 to 3 depending on the number of null restrictions. The same is true 
for the estimated correlation between the RLPSI and the net genetic merit (Fig. 3.2). 

Table 3.2 presents the estimated LPSI selection response and its heritabilities, and 
the estimated RLPSI selection response and its heritabilities for 1, 2, and 3 null 
restrictions for seven simulated selection cycles using a selection intensity of 10% 
(kı = 1.755). Note that the averages of the estimated RLPSI selection response for 
the seven selection cycles were 6.76, 5.30, and 3.70 for 1, 2, and 3 null restrictions 
respectively, and that 3.70, the average value for 3 null restrictions, is only 54.73% 
of the average value for 1 null restriction (6.76). However, the estimated RLPSI 
heritabilities for 1, 2, and 3 null restrictions tend to increase. This is because the 
simulated true heritabilities of traits Tj, T5, Тз, and T4 were 0.4, 0.6, 0.6, and 0.8 
respectively, whereas the averages of the estimated heritabilities of traits T1, T5, T3, 
and T4 were 0.70, 0.78, and 0.87 for 1, 2, and 3 null restrictions respectively. 

Table 3.3 presents the estimated LPSI expected genetic gain per trait and the 
estimated RLPSI expected genetic gain per trait for 1, 2, and 3 null restrictions for 


Table 3.2 Estimated linear phenotypic selection index (LPSI) selection response and its heritabil- 
ity, and estimated restricted LPSI (RLPSI) selection response and its heritability for one, two, and 
three null restrictions for seven simulated selection cycles 


RLPSI 

selection response for Heritability for one, 

one, two, and three two, and three 

LPSI restrictions restrictions 

Cycle selection response | Heritability | 1 2 3 1 2 3 
1 17.81 4.13 0.65 |0.77 0.89 
2 15.69 4.27 0.76 0.80 | 0.90 
3 14.22 416 0.71 |0.80 | 0.88 
4 14.34 3.72 0.71 |0.79 | 0.89 
5 13.64 0.75 6.02 5.16 3.24 0.67 |076 |0.86 
6 12.04 0.71 6.37 5.17 3.31 0.70 079 |0.86 
7 11.61 0.72 5.77 4.44 3.09 0.68 0.74 |0.84 
Average | 14.19 0.76 6.76 5.30 3.70 0.70 078 0.87 


The selection intensity was 10% (k; = 1.755) 
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Table 3.3 Estimated LPSI expected genetic gain per trait, and estimated RLPSI expected genetic 
gain per trait for one, two, and three null restrictions for seven simulated selection cycles 


RLPSI expected gain per trait for one 

LPSI expected gain per trait restriction 
Cycle ТІ Т2 T3 T4 Tl T2 T3 T4 
1 7.90 —4.67 3.33 1.92 0 —2.18 2.03 2.66 
2 7.06 =3.59 3.17 1.86 0 —3.41 2.33 2.71 
3 6.67 —3.21 2.82 1.52 0 —2.30 3.12 1.74 
4 7.53 —3.45 2.07 1.29 0 —2.88 1.42 2.38 
5 7.14 —2.66 2.51 1.33 0 —1.83 2.38 1.81 
6 6.23 —2.62 1.98 1.21 0 —2.41 2.09 1.87 
7 5.38 —2.55 2.47 1.22 0 —2.24 1.34 2.19 
Average | 6.85 =3,25 2.62 1.48 0 —2.46 2.10 2.19 

RLPSI expected gain per traits for two RLPSI expected gain per traits for three 

restrictions restrictions 
Cycle Tl T2 T3 T4 Tl T2 T3 T4 
1 0 0 2.77 2.77 0 0 0 4.13 
2 0 0 2.87 3.07 0 0 0 4.27 
3 0 0 3.11 2.68 0 0 0 4.16 
4 0 0 2.35 2.70 0 0 0 3.72 
5 0 0 3.12 2.04 0 0 0 3.24 
6 0 0 2.84 2.33 0 0 0 3.31 
7 0 0 2.07 2.37 0 0 0 3.09 
Average |0 0 2.73 2.57 0 0 0 3.70 


The selection intensity was 10% (k; = 1.755) 


seven simulated selection cycles using a selection intensity of 10% (k; = 1.755). In 
effect, due to the restriction СЪ = 0, matrix К projects b into a space smaller than 
the original space of b and the space reduction into which matrix K projects b is 
equal to the number of zeros that appear in the RLPSI expected genetic gain per trait. 

It can be shown that in the three restrictions case (Table 3.3) the estimated RLPSI 
expected genetic gain pert traits (or multi-trait selection response) is equal to the one 
trait selection response (Eqs. 2.4 and 2.5) when only trait T4 is selected. This means 
that in effect, when we imposed three restriction over the RLPSI expected genetic gains 
pert trait, we reduced one space of four dimensions to one space of only one dimension. 


3.2 The Predetermined Proportional Gains Linear 
Phenotypic Selection Index 


This index is called the predetermined proportional gains phenotypic selection index 
(PPG-LPSI) because the breeder pre-sets optimal levels for certain traits before the 
selection is carried out. The conditions for constructing a valid PPG-LPSI are the 
same as those described for the LPSI in Sect. 2.1 of Chap. 2. Some of the main 
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objectives of the PPG-LPSI are to optimize the expected genetic gain per trait, 
predict the net genetic merit H = wg, and select the individuals with the highest 
net genetic merit values as parents of the next generation. The PPG-LPSI allows 
restrictions different from zero to be imposed on the expected genetic gains of some 
traits, whereas other traits increase (or decrease) their expected genetic gains without 
imposing any restrictions. The PPG-LPSI solves the LPSI equations subject to the 
condition that the covariance between the LPSI and some linear functions of the 
genotypes involved be equal to a vector of predetermined constants or genetic gains 
defined by the breeder (Cunningham et al. 1970). 

Letd’=[d, 4 --- d,|beavectorr x 1 of the predetermined proportional 
gains and assume that jv, is the population mean of ће qth trait before selection. One 
objective could be to change д. to и. + dg, where d; is a predetermined change in p4 
(n the RLPSI, d; = 0, q = 1, 2, ---, г, where г is the number of predetermined 
proportional gains). We can solve this problem in a similar manner to that used with 
the RLPSI. That is, minimizing the mean squared difference between J and H(E 


[Н = DI) under the restriction DUGb = 0 where 
doo. 0 =a 
p-|9 4 7 9 0 | is a Mallard (1972) matrix (r — 1) x r of 


0 0 =- d, —4—\ 

predetermined proportional gains, d; (4 = 1, 2..., r) is the q" element of vector 
d, U is the RLPSI matrix of restrictions of 1’s and 0’s described earlier in this chapter, 
G is the covariance matrix of genotypic values, and b is the LPSI vector of 
coefficients. Also, it is possible to minimize E[(H — D] under the restriction 
U' Gb = 0d (Tallis 1985), where 0 is a proportionality constant, which is a scalar to 
be determined a posteriori (Lin 2005), that is, Ө is indeterminate a priori (Itoh and 
Yamada 1987). Both approaches are very similar but the equations obtained when 
introducing the D'U Gb = 0 restriction are simpler than when introducing U' Gb — 0d 
restrictions into the process of minimizing E[(H — I ae The D UGb = 0 restriction 
leads to a set of equations similar to Eq. (3.5) whereas the UGb = 6d restriction 
leads to a set of equations that are difficult to solve. 


th 


3.2.1 The Maximized PPG-LPSI Parameters 


Let М = DC be the Mallard (1972) matrix of predetermined restrictions, where 
С = UG. Under the restriction Mb = 0, we can minimize Е[(1— Н)?], assuming 
that P, G, U, D, and w are known; that is, we need to minimize the function 


®(b, у) = b'Pb + w Су — 2w' Gb + 2v M'b (3.10) 


with respect to vectors b and у = [vi vz +++ vı], where v is a vector of 
Lagrange multipliers. Note that the only difference between Eqs. (3.1) and (3.10) is 
matrix Р and that matrix М = D C has the same function in Eq. (3.10) that matrix 
C — UG had in Eq. (3.1). Then, the derivative results of Eq. (3.10) from b and 
v should be similar to those of Eq. (3.1), i.e., 
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P МЫ) |Gw 
M 0о||у| | 0 
whence the vector that minimizes E[(H — I y] under the restriction Mb — 0 is 


by = Kyb, (3.11) 


where Ky = [L — Qu], Qu = P ' MMP^!'M)'!M -P^'CD(DCP^!CD) !DC, 
and I, is an identity matrix of size t x t. When D = U, by, = b, (the RLPSI vector of 
coefficients), and when D — U and U is a null matrix, Буу = b (the LPSI vector of 
coefficients). Thus, the Mallard (1972) index is more general than the RLPSI and is 
an optimal PPG-LPSI. In addition, it includes the LPSI and the RLPSI as particular 
cases. 

Instead of using restriction Mb = 0 to minimize ЕС — H Y! we can use 
restriction C b = 0d and minimize 


®7(b, у) = b'Pb + wGw — 2w Gb + 2v (СЪ — Өд) (3.12) 


with respect to b, у, and Ө (Tallis 1985; Lin 2005) assuming that Р, С, о, d, and 
w are known. The derivative results in matrix notation are 


—1 


br P C 0,1 Gw 
ү|=| С Ox -d о |, (3.13) 
0 и, -d 0 0 


where 0, „185 a null vector f x 1, 0, х , is a null matrix r x f, and 0 is a null column 
vector (г — 1) х 1; 0 is the standard zero value. The inverse matrix of coefficients 


P C 0,1 
C On, -d in Eq. (3.13) is not easy to obtain; for this reason, Tallis 
j -а 0 


lxt 


(1985) obtained his results in two steps. That is, Tallis (1985) first derived Eq. (3.12) 
with respect to b and v, whence he obtained 


br = br + 06, (3.14) 


where bg = Kb (Eq. 3.5), 6 = P^ C(CP^!C)!d, and а = [4 4 с dp). 
Next, he derived E (by —H y only with respect to 0, and his result was 


b/C(CP-'C) d 
os (3.15) 
d'(CP"C) d 
where b = P^ !Gw is the LPSI vector of coefficients, C= UG, d is the vector of the 
predetermined proportional gains imposed by the breeder and Р! is the inverse of 
matrix P. When Ө = 0, by = br, and if O = 0 and U is the null matrix, by = b. That is, 


the PPG-LPSI obtained by Tallis (1985) is more general than the RLPSI and the 
LPSI. The foregoing results indicate that Eq. (3.14) consists of three parts: 
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1. Vector b, = Kb, which represents the weights of ће RLPSI with the restriction 
that the expected genetic gain per trait be equal to zero. 

2. Vector 6 = PCCP Cd, which should represent the weights of the 
PPG-LPSI leading to the greatest improvement in the desired direction indepen- 
dently of economic weights. 

3. Ө represents the regression coefficient of H = wg on б = Р'С(СР-'С) !а 
(Itoh and Yamada 1987). 


When 0 = 1, Eq. (3.14) is equal to 
br, = bg + ô. (3.16) 


The latter equation was the original result obtained by Tallis (1962). Tallis (1962) 
derived Eq. (3.12) with respect to vectors b and v under the restriction UGb — d, 
i.e., without 0 or 0 = 1. Later, James (1968) maximized the correlation between / and 
H(pgj) under the Tallis (1962) restriction and once more obtained Eq. (3.16). 
Mallard (1972) showed that Eq. (3.16) is not optimal, i.e., it does not minimize E 
-Н y4 and does not maximize p77, and gave the optimal solution, which we have 
presented here in Eq. (3.11). Later, using restriction UGb = 6d, Tallis (1985) 
obtained Eq. (3.14), which also is optimal. 

Figure 3.3 presents the estimated correlation values between PPG-LPSI and the 
net genetic merit (H = wg) for the optimal PPG-LPSI (Eq. 3.14) and non-optimal 
PPG-LPSI (Eq. 3.16) using one (d; = 7), two (d' = [7 -—3]), and three (4 = 
[7 —3 5] predetermined restrictions, four traits and 500 simulated genotypes in 


Optimum PPG-LPSI: EE 3 Моп-Орйтит PPG-LPsi: И 2 38 


Correlation values 


1 2 3 4 5 6 7 


Cycle 


Fig. 3.3 Estimated correlation values between the predetermined proportional gain linear pheno- 
typic selection index (PPG-LPSI) and the net genetic merit (H = wg) for the optimal and 
non-optimal PPG-LPSI using 1 (d; = 7),2 (8 = [7 —3]and3(d = [7 -—3 5])predetermined 
restrictions, 4 traits and 500 simulated genotypes in 1 environment for 7 selection cycles 
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one environment for seven selection cycles (see Sect. 2.8.1 of Chap. 2). Note that in 
effect, the non-optimal PPG-LPSI has lower correlations than the optimal PPG-LPSI 
for the seven simulated selection cycles. 

Let bp = by = br be the PPG-LPSI vector of coefficients. Then, the optimal 
PPG-LPSI can be written as 


Ip = Љу, (3.17) 


whereas the maximized correlation between the PPG-LPSI and the net genetic merit 
is 


= w Gb P 
Pate = Fel Gw b, Phe 


According to the conditions for constructing a valid PPG-LPSI described in Sect. 
2.1 of Chap. 2, the index Ip = Бру should have normal distributions. Figure 3.4 
presents the distribution of 500 estimated PPG-LPSI values with two (d' = [7 —3]) 
and three (d' = [7 —3  5]) predetermined restrictions respectively, obtained from 
one selection cycle, with four traits and 500 genotypes simulated in one environment 


(3.18) 
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Fig. 3.4 (a) and (b) show the distribution of 500 estimated predetermined proportional gain linear 
phenotypic selection index values with two (d' = [7 —3]) and three (4 = [7 —3 5]) 
predetermined restrictions respectively, obtained from one selection cycle for 500 genotypes and 
four traits simulated in one environment 
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(see Chap. 2, Sect. 2.8.1 for details). Figure 3.4 indicates that, in effect, the 
PPG-LPSI values approach normal distribution. 

Under the predetermined restrictions imposed by the breeder, Ip = Бру should 
have maximal correlation with H = wg and it should be useful for ranking and 
selecting among individuals with different net genetic merits. However, for more 
than two restrictions the proportionality constant (0) could be lower than 1; in that 
case, Pyr, is lower than the correlation between LPSI and Н = wg (оні). In addition, 
when the restriction Mb = 0 or ОСЬ = Ө is imposed on the PPG-LPSI vector of 
coefficients, the restricted traits decrease their effect on the correlation between 
PPG-LPSI and H — wg. Using the simulated data set described in Sect. 2.8.1 of 
Chap. 2, we estimated рр, and ры; for seven selection cycles and compared 
the results in Fig. 3.5. Correlation рну, values were estimated using one (dı = 7), 
two (d = [7 —3]) and three (d = [7 —3 5]) predetermined restrictions. 
Figure 3.5 indicates that when the number of predetermined restrictions is equal to 
or higher than two, the estimated values of py, decrease more than when only one 
predetermined restriction is imposed on the PPG-LPSI. 

The maximized PPG-LPSI selection response and expected genetic gains per trait 
can be written as 


Rp = kry/ bj Phy = kj y b Pbr (3.19) 

and 
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Fig. 3.5 Estimated correlation values between the LPSI and the net genetic merit (Н = wg); and 
estimated correlation values between the PPG-LPSI and H with опе (d; = 7), two (d' = [7 —3]), 
and three (d' = [7 —3  5]) predetermined restrictions obtained from seven selection cycles for 
four traits and 500 simulated genotypes in one environment 
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Gb; Gb; 


Ep =k =k 
"o T Jo Pohu РЬ: 


(3.20) 


respectively, where kz is the standardized selection differential or selection intensity 
associated with the PPG-LPSI. 

The maximized PPG-LPS selection response (Eq. 3.19) has the same form as the 
maximized LPSI selection response. Thus, under r predetermined restrictions, 
Eq. (3.19) predicts the mean improvement in H due to indirect selection on 
Ip = Ьу. Predetermined restriction effects are observed on the PPG-LPSI expected 
genetic gain per trait (Eq. 3.20). The main difference between the RLPSI and the 
PPG-LPSI is the vector of predetermined proportional gains. 


3.2.2 Statistical Properties of the PPG-LPSI 


Assuming that H = wg and Ip = bpy have a bivariate joint normal distribution, 
bp = Ку, = P 'Gw, and P, G and w are known, the PPG-LPSI has the same 
properties as the RLPSI. Some of the main PPG-LPSI properties are: 


1. Matrices Ом = P^ 'M(MP^'M)M and Ky = [I — Ом have the same 
function as matrices Q = P^! C(CP^!C)^!C and К = [I — О] in the RLPSI. 

2. Matrices Ом and К м are both projectors, i.e., they are idempotent (Ки = Ki, and 
Qu = О), unique and orthogonal, i.e., КмОм = QyKy = 0. 

3. Matrix Ом projects b into a space generated by the columns of matrix М due to 
the restriction M b = 0 that is introduced when Ф(, у) is maximized with respect 
to b, whereas matrix К; projects b into a space that is perpendicular to the space 
generated by the columns of matrix M (Rao 2002). Thus, the function of matrix 
Ky, is to transform vector b = Р 'Gw into vector bp = Ку. 

4. The variance of Гр = bby (от, = bpPbp) is equal to the covariance between 
Ip = bby and H= wa (or, = w'Gbp). AsKy = K2, K;,P = PKy andb P = wG, 
then 

oj, = bpPbp = b'K’ yPKyb = b'PKj;b = b'PKyb = WGbp = ош. 
1 


5. The maximized correlation between Н and Ip = Бру is equal to ры, = Te, In 


point 4 of this subsection, we showed that они, = 07, then 
= w Gbp = b Pb, = Olp 
Pls V w'Gw.4/ b; Pb» wGw on 


6. The variance of the predicted error, Var(H — Ip) = (1 == Piu, ) Cir is minimal. 


By point 4 of this subsection, OH], = Ors then 
Var(H — Ig) = от — о], = (1 — Phy) oi 


b,Gb 
7. The heritability of the PPG-LPSI is equal to hi, = Р 2, 
b, Pb» 
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Points 1-3 show that in effect, the PPG-LPSI projects the LPSI vector of 
coefficients into a different space than the original LPSI vector of coefficients. In 
addition, the PPG-LPSI statistical properties denoted by points 4—7 are the same as 
the LPSI statistical properties. Thus, the PPG-LPSI is a variant of the LPSI. 


3.2.3 There Is Only One Optimal PPG-LGSI 


Let $ = СР 'C, under the restriction D d = 0, Itoh and Yamada (1987) showed that 
D(DSD)'D = S~! — S~'a(dS~'d)~'d'S~!, whence substituting 871 — 8 'а(а- 
$14) 1957! for D'SD)~'D in matrix Qu, Eq. (3.11) can be written as Eq. (3.14), 
i.e., рм = br. Therefore, the Mallard (1972) and Tallis (1985) vectors of coefficients 
are the same. In addition, Itoh and Yamada (1987) showed that the Harville (1975) 
vector of coefficients can written as 2 (Eq. 2.21d), where o;, is the standard deviation 


of the variance of the Tallis (1985) PPG-LPSI. Thus, in reality, there is only one 


optimal PPG-LPSI. 
Itoh and Yamada (1987) also pointed out that matrix 


E. 0 > udi 
р’ = : di n М 70 is only one example of several possible 
0 0 >> d, -d,4 


Mallard (1972) D matrices. They showed that any matrix D that satisfies condition 
Dd = 0 is another Mallard (1972) matrix of predetermined proportional gains. 
According to Itoh and Yamada (1987), matrices 


d -d 0 . O0 0 

Ж $ or? 2| as 
0 0 0 0 d d. 
d; —dj 0 0 

D = d3 0 —di 0 


are also Mallard (1972) matrices of predetermined proportional gains because they 


d ое ow 
satisfy condition Dd = 0. However, matrix D' = | d, КА 0 =h is 
0 0 >> d, -d,4 


"easier" to construct. 
Harville (1975) maximized the correlation between Г and H (рун) under the 
restriction Cb = 04 and was the first to point out the importance of the 
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proportionality constant (0) in the PPG-LPSI. Mallard (1972) showed that the 
restriction U Gb — d does not maximize the correlation with the net genetic merit 
(H — wg) and Harville (1975) indicated that the restriction UGb-d only changes 
the sign of the genetic expected gain (or multi-trait selection response) but does not 
maximize the correlation between / — by and H — wg. According to Mallard (1972), 
Harville (1975), and Tallis (1985), the PPG-LPSI is optimal only under the 
restriction U Gb = 0d. 

Itoh and Yamada (1987) pointed out several problems associated with the Tallis 
(1985) PPG-PSI: 


1. When the number of restrictions imposed on the PPG-PSI expected genetic gains 
increases, 0 tends to zero and then the accuracy of the PPG-PSI decreases. 

2. The Ө values could be negative, in which case PPG-PSI results have no meaning 
in practice. 

3. The PPG-PSI may cause the population means to shift in the opposite direction to 
the predetermined desired direction; this may happen because of the opposite 
directions between the economic values and the predetermined desired direction. 


Itoh and Yamada (1987) thought that one possible solution to those problems 
could be to use the linear phenotypic selection index with desired gains. 


3.2.4 Numerical Examples 


The estimated phenotypic (P) and genetic (G) covariance matrices described in Sect. 
3.1.4 of this chapter for RLPSI are used as the first example. First, Eq. (3.11) is 
described to obtain the PPG-LPSI vector of coefficients. Let 4 = [7  —3] be the 
vector for 2 predetermined restrictions, then, the Mallard (1972) matrix is 


D'—[-3 -—7] while matrix U is О, = [ : | |] Matrix M' = D'U'G 
for 2 predetermined restrictions will be 


М = D'U,G =[-18.12 —52.49 —125 7.46], whence 


0.084 0.242 0.006 —0.034 

C Singin | 0.313 0906 0.022  —0.129 

Q,-P'M(MP' M) М- | (зз олов 0003 —0015| ?" 
—0.019 —0.055 —0.001 0.008 


0.916 —0.242 —0.006 0.034 
2 д —0.313 0.094 —0.022 0.129 
Buc [L Е Qu] ~ | —0.037 0.106 0.997 0.015 |" 

0.019 0.055 0.001 0.992 


I, is an identity matrix of size 4 x 4. 
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The estimated LPSI and  PPG-LPSI vectors of coefficients were 
Б' = [0.554 —1.053 1.090 1.058] and bi, = (Rub) = [0.793 —0.159 
1.1941.004] respectively, and the estimated PPG-LPSI was Тм = 0.793Т,— 
0.159Т> + 1.19473 + 1.004Т4. The standard deviation of the estimated variance 


of Ти was 6, = V b’ uPbu = 9.526, whereas the estimated correlation value 
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between the PPG-LPSI and the net genetic merit was Pyr, = =~ = 0.85, where бн 
On 


= V wGw = 11.202 is the estimated standard deviation of the variance of the net 
genetic merit. 

Suppose that the selection intensity was 10% (k; = 1.755); then, the estimated 
PPG-LPSI expected genetic gain per trait and the estimated selection response are 
biG 
МРБ 
\/БыББи = (1.755)(9.526) = 16.717 respectively. 


Now, let dj = [7 —3 5] be the vector for three predetermined restrictions, 
then there are three possible predetermined Mallard matrices, i.e., 


m= 0 | ое = [| -7 з], and р: = [3 —7 о, ana 


E^, = 1.755 = [8.013 —3.434 3.541 1.730] and Ry = (1.755) 


05 3 0 5 3 J OO m] 


1 0 0 0 
matrix U for three restrictions is 0; = |O 1 0 0 |. Thus, for three 
0 0 1 0 


predetermined restrictions matrix М’ = D'U'G shall have three possible forms, 


122.60 —40.85 —27.97 12.58 but 
—39.60 55.00 12.88 —10.72 " 


—18.12 —52.49 —1.25 7.46 
122.60 —40.85 —27.97 12.58 


Mj is different from matrices М» and М}, and that the two latter are the same; 


М, = DU,G = M} = | |. Note that matrix 


however, both matrices should lead to the same estimated PPG-LPSI vector of 
coefficients and to the same estimated PPG-LPSI expected genetic gain per trait 
and selection response. It can be shown that for matrices Mj $ М», апа M}, matrices 


Q m and Ки = [L 4 Qu] are the same and can be written as 


0.771 0.080 —0.145 0.026 
a _ | 0.123 0.951 0.063 —0.145 
Qu = —1.131 0.382 0.258  —0.117 
0.118 —0.087 —0.031 0.020 


and 


0.229 —0.080 0.145 —0.026 
Ky = —0.123 0.049 —0.063 0.145 
1.131 —0.382 0.742 0.117 
—0.118 0.087 0.031 0.980 
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The estimated LPSI vector of coefficients was equal to 
b’ = [0.554 —1.053 1.090 1.058], whereas the estimated PPG-LPSI vector 
of coefficients was bi, = (Rub) = [0.342 —0.035 1.960 0.914]. The esti- 
mated PPG-LPSI was Tu = 0312T, — 0.035T^ + 1.96073 + 0.91474 and the stan- 


dard deviation of the estimated variance of p was Gp, = V b мЁБм = 8.68. Тһе 
estimated correlation value between the PPG-LPSI and the net genetic merit was 


Pulp = ms 0.775, where бн = V wGw = 11.202 is the estimated standard 


deviation of the variance of the net genetic merit. 
Using a selection intensity of 10% (К, = 1.755), the estimated PPG-LPSI 
expected genetic gain per trait and the estimated selection response were 


"ME. [6.410 —2.747 4.579 1.496] and Ry = (1.755) V „РБ 
Vb u Pb, 
— (1.755)(8.68) — 15.32, respectively. 

According to Eq. G. 14), the estimated Tallis (1985) vector of coefficients can be 
obtained as br = = Dr + 05, where br = =КБ is the estimated RLPSI, 


/ [А 1 -1 
8—P-c(cP-c) a 6 - PCC- 


Ем = 1.755 


г 15 the estimated constant of pro- 


d(CP-'C) d 
portionality, b — P-^!Gw is the estimated LPSI vector of coefficients, and d' = 
[di d» -+> d,]isthe vector of predetermined restrictions. 


In Sect. 3.1.4 of this chapter we described how to obtain b R= = Kb, and we also 


obtained matrix C= UG for two and three null restrictions as 
‚_уу д _ | 36.21 —12.93 8.35 2.74 he. wa EN 
© =056 = | 1299 1304 -340 —2.24 ang кру 


36.21 —12.93 8.35 2.74 " "m 
—12.93 13.04 4.3.04 —2.24 |, whence the be = Kb values for two and 
8.35 —3.40 9.96 0.16 


three null restrictions were b. = [-0.164 0.162 0.680 0.856] апа 
bi, = [-0.032 0.136 0.059 0.890] respectively. 


The 0 and 6 values for two and three predetermined restrictions were 


LJ 5—1 —1 ру ipl =l 

е^ CAGE SS 42 6213, E (СР ey d _ 4.529, 
dj(CjP-C;) 4 | а,(С;Р-!С;) d; 

8, = (Р 'с.(с,Ё-'с›) 1а) = [0.153 —0.052 0.083 0.024], and 


8, = (PIC; (СР 1с.) as) = [0.083 —0.038 0.420 0.005]. With these 
results, the estimated Tallis (1985) vectors of coefficients for two and three 
predetermined restrictions were b, = [0.793 —0.159 1.194 1.004] and b;. 

= [0.342 —0.035 1.960 0.914] respectively. These latter two vectors of coef- 
ficients are the same as the vectors of coefficients obtained using the Mallard (1972) 


method for two and three predetermined restrictions. These results corroborate that, 
in effect, the Mallard (1972) and Tallis (1985) PPG-LPSIS are the same. 
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With the data set described in Sect. 2.8.1 of Chap. 2 we constructed Table 3.4, 
which presents the estimated LPSI selection response and heritability, and the 
estimated PPG-LPSI selection response and heritability for one, two, and three 
predetermined restrictions for seven simulated selection cycles using a selection 
intensity of 10% (k; = 1.755). The averages of the estimated PPG-LPSI selection 
responses were 14.19, 14.00, and 12.58 for one, two, and three restrictions respec- 
tively. Note that 14.19 is also the average value for the estimated LPSI selection 
response. This means that the PPG-LPSI and the LPSI selection responses are the 
same for only one predetermined restriction. However, the estimated PPG-LPSI 
selection responses for two and three restrictions tend to decrease (Table 3.4). The 
same is true for the estimated PPG-LPSI heritability. That is, the estimated 
PPG-LPSI heritability for one predetermined restriction is equal to the estimated 
LPSI heritability. The estimated PPG-LPSI heritability for two predetermined 
restrictions decreased, but increased for three predetermined restrictions 
(Table 3.4). This is because the simulated true heritabilities of traits Ту, 75, T3, and 
T4 were 0.4, 0.6, 0.6, and 0.8 respectively. 

Table 3.5 presents the estimated LPSI expected genetic gain per trait without 
restrictions, and the estimated PPG-LPSI expected genetic gain per trait for one, two, 
and three predetermined restrictions for seven simulated selection cycles using a 
selection intensity of 1096 (k, — 1.755). Once again, note that for one predetermined 
restriction, the estimated PPG-LPSI expected genetic gains were equal to the 
estimated LPSI expected genetic gains, and for two predetermined restrictions, the 
estimated PPG-LPSI expected genetic gains were similar to the estimated LPSI 
expected genetic gains; however, for three predetermined restrictions, the estimated 
PPG-LPSI expected genetic gains tended to decrease. 


Table 3.4 Estimated LPSI selection response and heritability, and estimated predetermined pro- 
portional gain LPSI (PPG-LPSI) selection response and heritability for one, two, and three 
predetermined restrictions for seven simulated selection cycles 


PPG-LPSI 

Selection response for Heritability for one, 

one, two, and three two, and three 

LPSI restrictions restrictions 

Cycle Selection response | Heritability 3 1 2 3 
1 17.81 0.84 15.23 |0.84 |0.77 | 0.83 
2 15.69 0.80 14.39 | 0.80 |0.78 | 0.83 
3 14.22 0.77 13.18 |0.77 |0.76 | 0.80 
4 14.34 0.76 11.56 |0.76 |0.75 0.78 
5 13.64 0.75 12.16 |0.75 |0.75 0.79 
6 12.04 0.71 10.77 |0.71 |0.71 0.76 
7 11.61 0.72 10.75 |0.72 |0.71 | 0.76 
Average | 14.19 0.76 12.58 |0.76 0.75 | 0.79 


The selection intensity was 10% (К, = 1.755) and the vectors of predetermined proportional gains 
for one, two, and three predetermined restrictions were d! =], d = [7 —3] and 
d,=[7 —3 5] respectively 
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Table 3.5 Estimated LPSI expected genetic gain per trait, and estimated PPG-LPSI expected 
genetic gain per trait for one, two, and three predetermined restrictions for seven simulated selection 
cycles 


PPG-LPSI expected gain per trait for one 

LPSI expected gain per trait restriction 

Cycle T1 T2 T3 T4 T1 T2 T3 T4 

1 7.90 —4.67 3.33 1.92 7.90 —4.67 3.33 1.92 

2 7.06 —3.59 3:17 1.86 7.06 —3.59 3.17 1.86 

3 6.67 —3.21 2.82 1.52 6.67 —3.21 2.82 1.52 

4 7.53 —3.45 2.07 1.29 7.53 —3.45 2.07 1.29 

5 7.14 —2.66 2.51 1.33 7.14 —2.66 2.51 1.33 

6 6.23 —2.62 1.98 1.21 6.23 —2.62 1.98 1.21 

7 5.38 —2.55 2.47 1.22 5.38 —2.55 2.47 1.22 

Average | 6.85 —3.25 2.62 1.48 6.85 —3.25 2.62 1.48 
PPG-LPSI expected gain per trait for two | PPG-LPSI expected gain per trait for three 
restrictions restrictions 

Cycle Tl T2 T3 T4 Tl T2 T3 T4 

1 8.01 —3.43 3.54 1.73 6.41 —2.75 4.58 1.50 

2 7.39 —3.17 3.22 1.81 5.89 —2.52 421 1.77 

3 6.86 —2.94 2.77 1.60 5.48 —2.35 3.91 1.45 

4 7.65 —3.28 2.12 1.27 4.76 —2.04 3.40 1.35 

5 6.88 —2.95 2.41 1.33 5.08 —2.18 3.63 1.28 

6 6.20 —2.66 1.98 1.21 4.39 —1.88 3.14 1.36 

7 5.50 —2.36 2.53 1.19 4.41 —1.89 3.15 1.30 

Average | 6.93 —2.97 2.65 1.45 5.20 —2.23 3.72 1.43 

The selection intensity was 10% (К, = 1.755) and the vectors of predetermined proportional gains 
for опе, two, and three restrictions were = 7, d' = [7 —3]andd'—([7 —3 5] respectively 


The first part of Table 3.6 presents the estimated correlation of the net genetic 
merit (Н = wg) with the estimated LPSI and RLPSI values for one, two, and three 
null restrictions. In addition, this first part presents the estimated LPSI versus RLPSI 
efficiency р = 100(Ag — 1) (Eq. 2.21, Chap. 2). The second part of Table 3.6 presents 
the estimated correlation of H — wg with the estimated LPSI and PPG-LPSI values 
for one, two, and three predetermined restrictions, and the estimated LPSI versus 
RLPSI efficiency р = 100(4p — 1). Finally, the third part of Table 3.6 presents the 
estimated variance of the predicted error (VPE) of the LPSI ((1 — Puron ), the 
RLPSI ((1 — pj, ) о), and the PPG-LPSI ((1 — pj, ) o) for one, two, and three 
restrictions for seven simulated selection cycles. 

The estimated VPE of the RLPSI is higher than that of the LPSI and PPG-LPSI 
for one, two, and three restrictions for the seven simulated selection cycles; however, 


the estimated VPE of PPG-LPSI is only greater than that of the LPSI for two and 
three predetermined restrictions. 


66 


3 Constrained Linear Phenotypic Selection Indices 


Table 3.6 Correlation of the net genetic merit with the LPSI, the RLPSI, and the PPG-LPSI for 
one, two, and three null and predetermined restrictions; LPSI versus RLPSI efficiency and LPSI 
versus PPG-LPSI efficiency, and estimated variance of the predicted error (VPE) of the LPSI, the 
RLPSI, and the PPG-LPSI for one, two, and three restrictions for seven simulated selection cycles 


RLPSI correlation for one, 


two, and three null restrictions 


LPSI versus RLPSI efficiency in 
percentage terms for one, two, 
and three null restrictions 


Cycle LPSI Correlation | 1 2 3 1 2 3 
1 0.91 0.35 0.28 0.21 159.16 221.34 331.65 
2 0.88 0.48 0.33 0.24 85.69 164.19 267.25 
3 0.87 0.44 0.35 0.25 98.42 145.51 241.61 
4 0.86 0.40 0.30 0.22 114.77 183.56 285.28 
5 0.86 0.38 0.32 0.20 126.47 164.15 321.00 
6 0.83 0.44 0.36 0.23 89.09 132.96 264.22 
7 0.83 0.41 0.32 0.22 101.23 161.60 275.26 
Average 0.86 0.41 0.32 0.23 110.69 167.62 283.75 
PPG-LPSI correlation for one, | LPSI vs. PPG-LPSI efficiency in 
two, and three predetermined | percentage terms for one, two, 
restrictions and three predetermined 
restrictions 
Cycle LPSI Correlation |1 2 3 1 2 3 
1 0.91 0.91 0.85 0.77 0 17.13 22.74 
2 0.88 0.88 0.88 0.81 0 3.44 10.42 
3 0.87 0.87 0.86 0.80 0 3.35 10.21 
4 0.86 0.86 0.86 0.70 0 2.32 22.96 
5 0.86 0.86 0.85 0.76 0 0.30 10.09 
6 0.83 0.83 0.83 0.74 0 0.83 11.13 
7 0.83 0.83 0.83 0.77 0 2.35 7.74 
Average | 0.86 0.86 0.85 0.77 0 4.25 13.61 
RLPSI VPE for опе, two, and | PPG-LPSI VPE for one, two, 
three null restrictions and three predetermined 
restrictions 
Cycle LPSI VPE 1 2 3 1 2 3 
1 22.53 110.16 | 115.52 | 119.96 | 22.53 50.44 57.14 
2 22.66 79.40 91.13 96.65 22.66 27.88 37.03 
3 21.95 70.92 76.70 81.97 21.95 26.14 33.55 
4 22.84 75.16 81.33 85.14 |2284 25.84 45.46 
5 22.13 70.75 73.86 79.11 |2213 22.49 32.69 
6 21.18 55.07 59.56 64.68 | 21.18 21.95 30.13 
7 19.47 52.44 56.85 60.14 | 19.47 21.45 25.53 
Average | 21.82 73.41 79.28 83.95 | 21.82 28.03 37.36 


Thus, according to the results obtained for the LPSI, the RLPSI, and the 
PPG-LPSI, the best predictor of the net genetic merit was the LPSI followed by 
the PPG-LPSI and the RLPSI. 
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The most important aspect of the desired gains linear phenotypic selection index 
(DG-LPSI) is that it does not require economic weights. Note that the LPSI expected 
genetic gain per trait E = К Gbis maximized when b = P^! Gw and is proportional to 
k; and оу. Now let Gb be written as 


Gb — d, (3.21) 


where d is the vector of desired gains. From Eq. (3.21), E can be written as 


Е = к=, (3.22) 
0] 

Equation (3.22) indicates that E is inversely proportional to ог; then we сап 
minimize o; with respect to b subject to the constraints Gb = d and then E is 
maximized (Brascamp 1984; Itoh and Yamada 1986). That is, we need to take the 
derivative of the function 


Фрс(Ъ, у) = 0.5(b'Pb) + у’ (Gb — d) (3.23) 


with respect to b and v, where v is a vector of Lagrange multipliers, assuming that P, 
G, and d are known. The restriction Gb — d in Eq. (3.23) is similar to the Tallis 
(1985) restriction UGb = 0d, but with U = I and 0 = 1, or 0 = A (Tallis 1962). 

It can be shown that the vector that minimizes с; and maximizes Е can be written 
as 


bp; = P-'G(GP-'G) 'd. (3.24) 


thus, in effect, as Gb = d, bpo = Р-'С(СР ^ !G) !'d—-P 'G(GP С) !Gb- b.In 
Eq. (3.24) we are assuming that the traits in the index are the same as those in the net 
genetic merit. However, this may not be the case, that is, the number of traits could 
be different from the number of genotypes. In the latter case, Eq. (3.21) should be 
written as СЪ = d and Eq. (3.24) as bp; = P^ G(G P- !G) "а (Itoh and Yamada 
1986). 

According to Itoh and Yamada (1986, 1988), Eq. (3.24) does not maximize the 
correlation between J and H (рун) nor the selection response because the covariance 
between / and H is not defined, given that Cov(H, I) — wGb requires the economic 
weight vector w and DG-LPSI does not use economic weights. However, note that 
because СЪ = d, the variance of the DG-LPSI is VarU[pg) = d (GPG) td =b Pb. 

In practice, d is chosen arbitrarily and then we are in the same situation as when 
economic weights need to be selected. Pesek and Baker (1969), Yamada et al. 
(1975), and Itoh and Yamada (1986, 1988) argued that this should not be a problem 
for experienced breeders because they must know the relative merits and demerits of 
their strains. However, this may be true only for some breeders and the selection of 
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d is always subjective. Another problem with this index is that, as it is not associated 
with H = wg, it is not a predictor of H= wg. 


3.4 Applicability of the LPSI, RLPSI, and PPG-LPSI 


In the context of animal breeding, Hazel (1943) pointed out that because any index is 
constructed from data on a herd in one locality, it may not be widely applicable. The 
reasons for this are: 


1. Relative economic values for a trait may vary according to the particular locality 
or nature of the enterprise. 

2. The genetic constitution of herds may differ, especially when they are under 
distinctly nonrandom mating systems such as intense inbreeding. 

3. Different managerial practices may cause standard deviations for the traits to vary 
in different herds. The standard deviations for subjective traits such as market 
conformation measured by judging or by scores may vary because different 
judges vary the range over which they spread their scores. 

4. Few herds are large enough to provide enough data to make the sampling errors of 
the genetic constants small. These limitations are applicable to the LPSI, RLPSI, 
and PPG-LPSI, and to all selection indices described in this book. 
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Chapter 4 
Linear Marker and Genome-Wide Selection с: 
Indices 


Abstract There are two main linear marker selection indices employed in marker- 
assisted selection (MAS) to predict the net genetic merit and to select individual 
candidates as parents for the next generation: the linear marker selection index 
(LMSI) and the genome-wide LMSI (GW-LMSI). Both indices maximize the 
selection response, the expected genetic gain per trait, and the correlation with the 
net genetic merit; however, applying the LMSI in plant or animal breeding requires 
genotyping the candidates for selection; performing a linear regression of phenotypic 
values on the coded values of the markers such that the selected markers are 
statistically linked to quantitative trait loci that explain most of the variability in 
the regression model; constructing the marker score, and combining the marker score 
with phenotypic information to predict and rank the net genetic merit of the 
candidates for selection. On the other hand, the GW-LMSI is a single-stage proce- 
dure that treats information at each individual marker as a separate trait. Thus, all 
marker information can be entered together with phenotypic information into the 
GW-LMSI, which is then used to predict the net genetic merit and select candidates. 
We describe the LMSI and GW-LMSI theory and show that both indices are direct 
applications of the linear phenotypic selection index theory to MAS. Using real and 
simulated data we validated the theory of both indices. 


4.1 TheLinear Marker Selection Index 


4.1.1 Basic Conditions for Constructing the LMSI 


In Chap. 2, Sect. 2.1, we indicated ten basic conditions for constructing a valid linear 
phenotypic selection index (LPST). These ten conditions are also necessary for the 
linear marker selection index (LMSJ); however, in addition to those conditions, the 
LMSI also requires the following conditions: 


1. The markers and the quantitative trait loci (QTL) should be in linkage disequi- 
librium in the population under selection. 
2. The QTL effects should be combined additively both within and between loci. 
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3. The QTL should be in coupling mode, that is, one of the initial lines should have 
all the alleles that have a positive effect on the chromosome, and the other lines 
should have all the negative effects. 

4. The traits of interest should be affected by a few QTL with large effects (and 
possibly a number of very small QTL effects) rather than many small QTL 
effects. 

5. The heritability of the traits should be low. 

6. Markers correlated with the traits of interest should be identified. 


Under these conditions, the LMSI should be more efficient than the LPSI, at least 
in the first selection cycles (Whittaker 2003; Moreau et al. 2007). 


41.2 The LMSI Parameters 


Let у; = g; + e; be the ith trait (i = 1, 2, ..., t, t = number of traits), where e;~N(0, б) 
is the residual with expectation equal to zero and variance value 02, апа № stands for 


normal distribution. Assuming that ће QTL effects combine additively both within 
and between loci, the ith unobservable genetic value g; can be written as 


No 
& = 5 Akik» (4.1) 
k=1 


where a; is the effect of the kth QTL, ах is the number of favorable alleles at the kth 
QTL (2, 1 or 0), and No is the number of QTL affecting the ith trait of interest. 

If the QTL effect values are not observable, the g; values in Eq. (4.1) are also not 
observable; however, we can use a linear combination of the markers linked to the 
QTL (5) that affect the ith trait to predict the g; value as 


M 
$ = У Өх), (4.2) 
j=l 


where 5; is a predictor of g;, 0; is the regression coefficient of the linear regression 
model, x; is the coded value of the jth markers (e.g., 1, 0, and —1 for marker 
genotypes AA, Aa and aa respectively), and M is the number of selected markers 
linked to the QTL that affect the ith trait. Equation (4.2) is called the marker score 
(Lande and Thompson 1990; Whittaker 2003) and this is the main reason why the 
LMSI is not equal to the LPSI described in Chap. 2. The number of selected markers 
is only a subset of potential markers linked to QTL in the population under selection; 
thus, the s; values should be lower than or equal to the g; values. One way of 
estimating the s; values is to perform a linear regression of phenotypic values on 
the coded values of the markers, select markers that are statistically linked to 
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quantitative trait loci that explain most of the variability in the regression model, and 
then obtain the estimated value of 5; (5;) as the sum of the products of the QTL effects 
linked to markers and multiplied by the marker coded values associated with the ith 
trait. Some authors (e.g., Moreau et al. 2007) сай the molecular score; in this book, 
we call s; the marker score and $; the estimated marker score. 

The objective of the LMSI is to predict the net genetic merit of each individual 
and select the individuals with the highest net genetic merit for further breeding. In 
the LMSI context, the net genetic merit can be written as 


Н = wg + wis = |w №, | В = a'z, (4.3) 
where g' = [8i ih 8, | is the vector of breeding values; w = [wi --- и] is 
the vector of economic weights associated with g; w, = [01 --- 0,] is a null 
vector associated with the vector of marker scores 5 = [51 +-+- s+]; s; is the ith 
marker score; a’ = [м №, | апал = [2 s]. 


The information provided by the marker score сап be used in breeding programs 
to increase the accuracy of predicting the net genetic merit of the individuals under 
selection. The LMSI combines the phenotypic and marker scores to predict H in each 
selection cycle and can be written as 


СИЛНЕ (4.4) 


where В, апа В, are vectors of phenotypic and marker score weights respectively; 
у= [у -+ y,] is the vector of trait phenotypic values and s was defined in 


Eq. (4.3); f = [В В, | апа ё = [у s]. 
The LMSI selection response сап be written as 
a’ Zyup 
Va Zyav/ p Typ 


where kz is the standardized selection differential of the LMSI, он = Va'Zya and 
v PT, are the standard deviations of the variances of Н and Гм, whereas Рин and 


Ru = ЮонрРын = бн (4.5) 


a'Zy are the correlation and the covariance between H and Jy respectively; Ty 


Е y |Р S _ ё|_|С S і 
=] = [S | and Zu = varl ®| = |$ ç| ae block matrices of 


covariance where P = Var(y), S = Var(s), and C = Var(g) are the covariance 
matrices of phenotypic values (y), the marker score (s), and the genetic value (g) 
respectively in the population. Vectors a and В were defined in Eqs. (4.3) and (4.4) 
respectively. 

The LMSI expected genetic gain per trait can be written as 
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(4.6) 


All the parameters in Eq. (4.6) were previously defined. 


4.1.3 The Maximized LMSI Parameters 


Suppose that P, S and С are known matrices; then, matrices Тм and Ям are known 
and, according to the LPSI theory (Chap. 2 for details), the LMSI vector of 
coefficients (В) that maximizes ру, н, Ry, and Ey can be written as 


В = Ти Zma, (4.7) 


whence the maximized selection response and the maximized correlation (or LMSI 
accuracy) between Н and Гм can be written as 


Ry = kry 'ТмВ, (4.8a) 
and 


Рин = a (4.8b) 
OH 
respectively, where oz, = y P'Tmß is the standard deviation of the variance of Гу 
and он = Va'Zya is the deviation of the variance of H. Equations (4.8a) and (4.8b) 
show that the LMSI is a direct application of the LPSI theory in the marker-assisted 
selection (MAS) context. 
Let Q = ту Zm; then, matrix Q can be written as 


| (P-S)'(C-S) 0 49 
Q Rc I|’ НЯ 
whence В = Qa, and as w, —[0; --- 0,], we can write the two vectors of f 


-[f В,] as 
B,-(P-S) (C-S)w and В, = [I —(Р—5) (С s)| w. — (4.102) 
Another way of writing the marker score vector weights is 
В, =w- B, (4.10b) 


where В, = (P — S) (€ — S)w. By Eq. (4.10b), the optimal LMSI can be written as 
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м = w's + fiy — s). (4.11) 


Equation (4.11) indicates that, in practice, to estimate the optimal LMSI, we only 
need to estimate the vector of coefficients B,. By Eq. (4.10a), Eq. (4.8а) can be 
written as 


Ru i| wete - 9) "(c S)w + wS|I (P—S)'(C—S)]w. (412) 


Thus, by Eqs. (4.10a) and (4.12), when S is a null matrix, vector f), is equal to 
В, = P^!Cw = b and Ry = krVb/Pb = R;, which are the LPSI vector of coeffi- 
cients and its selection response respectively. 

Assume that when the number of markers and genotypes tend to infinity, S tends 
to C; then, at the limit, we can suppose that 8 = C, and by this latter result, Км is 
equal to 


КУ w'Cw. (4.13) 


That is, Eq. (4.13) is the maximum value of the LMSI selection response when 
the numbers of markers and genotypes tend to infinity. Thus, the possible LMSI 
selection response values of Eq. (4.12) should be between kiv b/Pb and k;\/w'Cw, 
1.е., 


krv ЬРЬ < Ru < Кум w'Cw, (4.14) 


w'Cw — он А 
ог between 1 and г а. that is, 


6, 
1< Ry < Ë. (4.15) 

OT 
Note that 2 = 2d where pj; is the maximized correlation between the net genetic 


merit (Н) and the LPSI (Г) described in Chap. 2. Equation (4.15) indicates that LMSI 
efficiency tends to infinity when the рн, value tends to zero and is an additional way 
of denoting the paradox of LMSI efficiency described by Knapp (1998), which 
implies that LMSI efficiency tends to infinity when the рн, value tends to zero. 


4.1.4 The LMSI for One Trait 


For the one-trait case, matrices Тм, Zm, and Q can be written as 
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N 


o—o 
8 5 0 
2: 2 2 2 DEUS 
су 6 Oo, o 0; —0 
y 5 5 __ y 5 
T= |^ al Zu = | 1 and Q— gg , (4.16) 
Os Os 5 5 У g 1 
92—02 
у 5 
where оу, б, апа o? are the phenotypic, genetic, and marker score variances 


respectively. By Eqs. (4.10a) and (4.10b), when а’ = [1 0], the elements of vector 
p = Qa are 


o? = о? 
Ву == and р,=1-В, (4.17а) 
607—6 Б 


y 5 


whence the optimal LMSI can be written as 
м = s f,(y — 5); (4.17b) 


whereas by Eq. (4.12), the maximized LMSI selection response can be written as 


Ru = (4.18) 


Осу? 
ey — о; 
2 


2 
When о? = 0,6, = ^5 = 12, Im = hy, and Ry = k£ = ko? = R, the selecti 
5 Uy 02 Мм У, м = = Коуп = К, the selection 
у y 


response for the one-trait case without markers. 


4.1.5 Efficiency of LMSI Versus LPSI Efficiency for One 
Trait 


Suppose that the intensity of selection is the same in both indices; then, to compare 
LMSI versus LPSI efficiency for predicting the net genetic merit, we can use the 


ratio Ay = A = fy (Bulmer 1980; Moreau et al. 1998), where А; is the maximized 


LPSI selection response. In percentage terms, the LMSI versus LPSI efficiency can 
be written as 


Py = 100(Ay = 1). (4.19) 


When py = 0, the efficiency of both indices is the same; when py > 0, the 
efficiency of the LMSI is higher than that of the LPSI, and when pm < 0, LPSI 
efficiency is higher than LMSI efficiency for predicting the net genetic merit. 

In the case of one trait, Lande and Thompson (1990) showed that LMSI efficiency 
(not in percentage terms) with respect to phenotypic efficiency can be written as 
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Км _ q (01-9) 
R № 1— qh?’ 


(4.20) 


2 

where Ry was defined in Eq. (4.18), R = ko,h’, А is the trait heritability, and q = = 
б 

8 

is the proportion of additive genetic variance explained by the markers. According to 


Eq. (4.20), the advantage of the LMSI over phenotypic selection increases as the 
2 


о; 
tends to | and Eq. (4.20) approaches 1. Therefore, the LMSI is most efficient for traits 
with low heritability and when the marker score explains a large proportion of the 
genetic variance. Thus, note that when h* tends to zero, ; tends to infinity; this means 
that in the asymptotic context, LMSI efficiency with respect to phenotypic efficiency 
for one trait (Eq. 4.20) tends to infinity and this is the LMSI paradox pointed out by 
Knapp (1998). There are other problems associated with the LMSI: it increases the 
selection response only in the short term and can result in lower cumulative 
responses in the longer term than phenotypic selection, as the LMSI fixes the QTL 
at a faster rate than phenotypic selection. In addition, it requires the weights 
(Eq. 4.172) to be updated, because in each generation the frequency of the QTL 


changes (Dekkers and Settar 2004). 


population size increases and heritability decreases, because in such cases, g = 


4.1.6 Statistical LMSI Properties 


Assume that H and Гм have bivariate joint normal distribution, В = Ty Zya, and 
that P, C, S, and w are known; then, the statistical LMSI properties are the same as 
the LPSI properties described in Chap. 2. That is, 


1. б = ошы: the variance of Im (от) and the covariance between Н and Im (он) 


аге Ше вате. 
01 


2. The maximized correlation between H and Iy (ог I4; accuracy) is рну, = v 


3. The variance of the predicted error, Var(H — Гм) = (1 — 79 6;,, is minimal. 


4. The total variance of H explained by [jy is 6j, = Pitty 67. 


/ 
7 
5. The heritability of Jy is h2, = BuZubu 

p, Tu By 


Properties 1 to 4 are the same аз LPSI properties 1 to 4, but, because the LMSI 
jointly incorporates the phenotypic and marker information to predict the net genetic 
merit, LMSI accuracy should be higher than LPSI accuracy. The same is true of the 
LMSI selection response and expected genetic gain per trait when compared with the 
LPSI selection response and expected genetic gain per trait. 
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4.2 The Genome-Wide Linear Selection Index 


The genome-wide linear marker selection index (GW-LMSD is a single-stage 
procedure that treats information at each individual marker as a separate trait. 
Thus, all marker information can be entered together with phenotypic information 
into the GW-LMSI, which is then used to predict the net genetic merit. In a similar 
manner to the LMSI, the GW-LMSI exploits the linkage disequilibrium between 
markers and the QTL produced when inbred lines are crossed. 


4.2.1 The GW-LMSI Parameters 


In a similar manner to ће LPSI, the main objective of the GW-LMSI is to predict the 
net genetic merit values of each individual and select the best individuals for further 
breeding. In the GW-LMSI context, the net genetic merit can be written as 


H = wg4 wm-[w АН = ауду, (4.21) 
where g —[g, ... g] (= 1, 2, ..., t = number of traits) is the vector of 
breeding values, у’ = [ил --- и, | is the vector of economic weights associated 
with the breeding values, and У, —[0; .-.. Om] is a null vector associated with 
the coded values of the markers m’=[m, ··· ты |, where m; (j = 1, 2,..., 
m = number of markers) is the jth marker in the training population; 


ау = [м м5 | апі лду —[g m]. 

The GW-LMSI (Гу) combines the phenotypic value and ће molecular informa- 
tion linked to the individual traits to predict H values in each selection cycle. It can 
be written as 


tw = yt Bom [В в] | | = pute (4.22) 


where В, and fj, are vectors of phenotypic and marker weights respectively; 


у= рр: y] is the vector of phenotypic values and m was defined in 
Eq. (4.21); By = (В, Pn] and ty [y m]. 
The GW-LSI selection response can be written as 


aly Pp 
ү/ау Paw /ВуФВу 


where kz is the standardized selection differential of the GW-LMSI, B = ay Vay 
and Var(Iw) = By®By are the variance of Н and Iw, whereas ру н = 


aw Ч Ву, 
Vay Yaw V By ФВуу 


Rw = kjogpy,g = kien (4.23a) 


and ауу В; are the correlation and the covariance between 
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. | _ У IP W 
Н and Гу respectively; Ф = LM = B M and 
W = Var ej |c w are block covariance matrices where P = Var(y) 
OU m| |у M E d. 


M = Var(m), C = Var(g), and W = Cov(y, m) = Cov(g, m) are the covariance 
matrices of phenotypic values (y), the molecular marker (m) coded values, and the 
genetic (g) values, whereas W is the covariance matrix between y and m, and 
between g and m. The size of matrices P and C is t x t, but the sizes of matrices 
M and W are m x m and m x t respectively. 

From a theoretical point of view, Crossa and Cerón-Rojas (2011) showed that 
matrix M can be written as 


1 (1 — 2611) oss (1 — 26,y) 
eem P ICE шы 
(1-26) (1—26y2) с 1 


where (1 — 2ój) is the covariance (or correlation) and б; the recombination fre- 
quency between the ith and jth marker (i, j = 1, 2, ..., т = number of markers). 
According to Crossa and Cerón-Rojas (2011), matrix W can be written as 


(lI-2rije& (l—2ruijan + (1—2rinjaing 
we (1— 2r)an (1— 2722)а22 үн (l= 2тзм)азмо (4.23c) 
(1—2га)ал (1—2rw2)8o +++ (1 —2ryw)aing 
where (1 — 2rx)agx (i = 1, 2, ..., m, К = 1, 2,..., Мо = number of QTL, q = 1, 
2, ..., f) is the covariance between the qth trait and the ith marker; rz is the 


recombination frequency between the ith marker and the kth QTL; and a is the 
effect of the kth QTL over the qth trait. 
The GW-LMSI expected genetic gain per trait can be written as 


Фр 
v pep. 


All parameters in Eq. (4.24) were previously defined. 

Matrix Ф could be singular, i.e., its inverse (Ф » could not exist because matrix 
W is singular. Suppose that matrices Ф апа Ч are known; then, according to the 
ГРІ theory, the GW-LMSI vector of coefficients (By) that maximizes р, can be 
written as 


Ery = kı (4.24) 
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Ву = Ф Yay, (4.25a) 


where matrix Ф denotes a generalized inverse of ®. By Eq. (4.25a), the maximized 
GW-LMSI selection response is 


Rw = ki / Bi, ®By. (4.25b) 


Equations (4.25a) and (4.25b) show that the GW-LMSI is a direct application of 
the LPSI to MAS. By Eq. (4.253), the maximized correlation between Н and Гу is 


OT 
Рун =—> (4.25c) 
OH 


where от, = \/By®fy, is the standard deviation of the variance of Iw and он 
= 4/ ау, Vay is the standard deviation of the variance of Н. 


4.2.2 Relationship Between the GW-LMSI and the LPSI 


Matrix Ф сап be written as 


ЕЕ L- -L^W'M- 
P = MWL M--M-WL- WM- | TEN 


where Г, 15 a generalized inverse of matrix L = P — WM W, andM isa 
generalized inverse of matrix M. In matrix Ф , the inverse of matrix W is not 
required and the standard inverse of matrix M (М!) may exist. In the latter case, the 
standard inverse of matrix L (L~') exists and can be written аз L7! = (P — WM! 
W)! = P~! + P- 'W [M — WPW] IWP! (Searle et al. 2006). 

By Eq. (4.26) and because w, = [01 --- Oy], the vector components of 
Ву = [В, В) |, or By = Ф Way, can be written as 


B, = [L.C - УМУ (4.27) 


апа 


Bn = (М+М WL WM)W-M' WL См, (4.28) 


where w is the vector of economic weights. Suppose that there is no marker 
information; then, matrices M and W are null and Eq. (4.27) is equal to B, = Р"! 
Cw = b (Ше LPSI vector of coefficients), whereas fj, = 0 and 
Rw = К \/ВуФВу = krv b'Pb = Кү, the LPSI selection response. Now suppose 


that the markers explain all the genetic variability; in this case, f), = 0 and В„ = (X 
X) XY, the matrix of linear regression coefficients in the multivariate context, 
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where хх is а generalized inverse matrix of ХХ and У is a matrix of phenotypic 
observations. 


4.2.3 Statistical Properties of GW-LMSI 


Assume that Н and Iw have bivariate joint normal distribution, By = Ф Way, and P, 
C, M, W, and w are known; then, the statistical GW-LMSI properties are the same as 
the LMSI properties. That is, 


1. о? = онгу, 1.е., the variance of Iw (07) and the covariance between H and Гу ( 


и 
бніу) are the same. 


. . . . б, 
2. The maximized correlation between H and Zw, or Iw accuracy, is рну, = vh 


3. The variance of the predicted error, Var(H — Iw) = (1 = Pus ) eir is minimal. 
4. The total variance of H explained by Iw is о? = Ошу б. 

According to Lange and Whittaker (2001), GW-LMSI efficiency should be 
greater than LMSI efficiency. However, this would be true only if matrices P, C, 
M, and W are known and trait heritability is very low. 


4.3 Estimating the LMSI Parameters 


When covariance matrices P, C, and S, and the vector of economic weights (w) are 
known, there is no error in the estimation of the LMSI parameters (selection 
response, expected genetic gain, etc.); the same is true for the GW-LMSI when, in 
addition to P, C, and w, the covariance matrices M and W are known. In such cases, 
the relative efficiency of the LMSI (GW-LMSI) depends only on the heritability of 
the traits and on the portion of phenotypic variation associated with markers. Using 
simulated data, Lange and Whittaker (2001) found that GW-LMSI efficiency was 
higher than LMSI efficiency when trait heritability was 0.2 and matrices P, C, M, 
and W were known. When P, C, S, M, and W are unknown, it is necessary to 
estimate them; then, the LMSI and GW-LMSI vector of coefficients and the effects 
associated with markers are estimated with some error. This error leads to lower 
LMSI and GW-LMSI efficiency than expected under the assumption that the 
parameters are known; however, in the latter case, Lange and Whittaker (2001) 
also found that GW-LMSI efficiency was greater than that of the LMSI when trait 
heritability was 0.05. Moreover, in the LMSI there is additional bias in the estima- 
tion of the parameters because only markers with significant effects are included in 
the index (Moreau et al. 1998). 

In Chap. 2, we described the restricted maximum likelihood (REML) method for 
estimating matrices P and C. Some authors (Lande and Thompson 1990; Charcosset 
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and Gallais 1996; Hospital et al. 1997; Moreau et al. 1998, 2007) have described 
methods for estimating marker scores, the variance of the marker scores, the LMSI 
vector of coefficients, etc., in the context of one trait; however, up to now there have 
been no reports on the estimation of matrix S in the multi-trait case. Lange and 
Whittaker (2001) only indicated that matrix S can be estimated as $ – Var(s), 
where § is a vector of estimated marker scores associated with several individual 
traits. 
The main problems associated with the estimated LMSI parameters are: 


1. The estimated values of the covariance matrix S (S) tend to overestimate the 
genetic covariance matrix (C). 
2. The estimated variances of the marker scores can be negative. 


When the first point is true, the estimated LMSI selection response and efficiency 


could be negative because the estimated matrix Ту = E J is not positive 
definite (all eigenvalues positive) and the estimated matrix Žu = E s| is not 
positive semi-definite (no negative eigenvalues). In addition, the results can lead to 
all weights being placed on the molecular score and the weights on the phenotype 
values can be negative (Moreau et al. 2007). When the second point is true, the 
variance of the marker scores is not useful. The two problems indicated above could 
be caused by using the same data set to select markers and to estimate marker effects, 
and there is no simple way of solving them. Lande and Thompson (1990) proposed 
that the markers used to obtain $ be selected a priori as those with the most highly 
significant partial regression coefficients from among all the markers in the linkage 
group analyzed in the previous generation. Zhang and Smith (1992, 1993) proposed 
using two independent sets of markers: one to estimate marker effects and the other 
to select markers. Additional solutions to these problems were described by Moreau 
et al. (2007). 

In this subsection, we describe methods (in the univariate and multivariate 
context) for estimating molecular marker effects, marker scores, and their variance 
and covariance, and for estimating the LMSI and GW-LMSI vector of coefficients, 
selection response, expected genetic gain, and accuracy. This subsection is only for 
illustration; we use the same data set to select markers, and to estimate marker effects 
and the variance of marker scores. 


4.3.1 Estimating the Marker Score 


According to Eqs. (4.11) and (4.17b), when the vector of economic weights is equal 
toa’=[1 0], the LMSI for the ith trait y; (1 = 1, 2,---, 5 t = number of traits) 
value can be written as Im, = 5 + f, (y;j— 5) (l = L2, п; п = number of 
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ss aed бу, -= 6; K-a). 
individuals or genotypes), where J, = — = is the LMSI coeffi- 
У 2 2 2 
о, — о, 1— qih; 
o? ot 
cient, h? = a is the heritability of the ith trait, and q; = — is the proportion of 
Oy, б, 


genetic variance explained by the QTL or markers associated with the ith trait; 5; 


M 
= N^ Bg (j = 1, 2,--, M; M = number of selected markers) is the ith 
j=l 


2 
8” 
phenotypic, genetic, and marker score values respectively. 

The simplest way of estimating the ith marker score s; is to perform a multiple 
linear regression of phenotypic values (y;) on the coded values of the markers (x;) and 
then select the markers statistically linked to the ith QTL that explain most of the 
variability in the regression model and use them to construct s; = У 0 jx. 

jeM 
We can fit the model y* = 5 Өрх; + e, Where y = y; — y; and y; are the average 
jeM 
values of the ith trait, by maximum likelihood or least squares. When estimating Ө, 
the main problem is to choose the set of markers M based on criteria for declaring 
markers as significant and then use the estimated values of 0j (0) to estimate the ith 
marker score s; as 5; = 5 дух j. The values of S; may increase or decrease according 
jeM 
to the number of markers (x;) included in the model, and 5; affects LMSI selection 
response and efficiency by means of the estimated variance of $; (62) (Figs. 4.1 and 


4.2). 

According to the least squares method of estimation, 6= (Х'х) !х'у* is an 
estimator of the vector of regression coefficients 0' = [01 62 -.. Om], where 
m (m < n) is the number of markers, X is а matrix n x т of coded marker values 
(e.g., 1, 0 and —1 for marker genotypes AA, Aa, and aa respectively) and y* is a 
vector n x 1 of phenotypic values centered based on its average values. Only a 
subset M(M < m) of the m markers is statistically linked to the QTL and then only a 


individual trait marker score; and e. с, and o? are the ith variances of the 


M 
subset M of the estimated vector 0 values is selected to estimate s; as 5; = 5 Өх}. 
j=l 


To illustrate how to obtain 5; = 0k j, We use a real maize (Zea mays) Б 
jeM 

population with 247 genotypes (each one with two repetitions), 195 molecular 
markers, and four traits — grain yield (GY, ton ha 5; plant height (PHT, cm), ear 
height (EHT, cm), and anthesis day (AD, days) – evaluated in one environment. In 
an F, population, the marker homozygous loci for the allele from the first parental 
line can be coded by 1, whereas the marker homozygous loci for the allele from the 
second parental line can be coded by —1, and the marker heterozygous loci by 0. 
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Fig. 4.1 Efficiency of the linear molecular selection index with respect to phenotypic selection for 
the one-trait case for different values of the variance of the marker score when the phenotypic and 
genetic variances are fixed 


LMSI response values 


0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 
Variace of the marker score values 


Fig. 4.2 Selection response values of the linear molecular selection index for the one-trait case for 
different values of the variance of the marker score when the phenotypic and genetic variances are 
fixed 


For this example, we used trait PHT. Only seven markers were statistically linked 
to the PHT. The estimated vector of regression coefficients for these seven markers 
was Ө’ = [5.46 —4.54 0.98 7.39 —7.75 1.91 3.53]. Table 4.1 pre- 
sents the first 20 genotypes, the coded values of the seven selected markers, and 
the first 20 estimated Spyr values of the 247 genotypes in the maize (Zea mays) Fy 


4.3 Estimating the LMSI Parameters 85 


Table 4.1 Number of selected genotypes, coded values of seven selected markers, and estimated 
marker score values obtained from a maize (Zea mays) Е population with 247 genotypes and 
195 molecular markers 


Coded values of the selected markers 
Number of genotypes | MI M2 мз M4 М5 M6 М7 Marker score 
1 0 0 0 0 0 1 =f 1.62 
2 -1 =1 0 0 0 =] 0 0.99 
3 0 0 0 0 0 0 1 —3.53 
4 1 1 0 0 0 =l =1 6.37 
5 1 1 0 1 1 1 1 6.72 
6 0 0 1 0 0 0 0 0.98 
7 1 1 0 1 1 0 0 0.57 
8 0 0 0 0 0 0 0 0 
9 0 0 1 0 0 1 0 —0.93 
10 0 0 1 1 0 0 1 4.84 
11 0 0 0 0 0 0 0 0 
12 -1 =1 0 0 0 0 0 —0.92 
13 0 0 0 0 0 0 0 0 
14 1 1 0 —1 —1 0 =I 4.81 
15 0 0 1 —1 -1 0 0 1.34 
16 0 0 0 0 0 0 0 0 
17 -1 ==] 0 0 0 0 1 —4.46 
18 —1 —1 0 0 0 0 1 —4.46 
19 -1 E 1 0 0 =] 1 —1.56 
20 0 0 0 0 0 0 e 3.53 


population. According to 0' and the coded values of the seven markers, the first 
estimated Spyr value was obtained as Spyr; = —1.91(1) + —3.53(—1) = 1.62; 
the second estimated Spyr value was obtained as Spyro = 5.46(—1)+ 
4.54(—1) — 1.91(—1) = 0.99, etc. The 20th estimated Spyr value was obtained 
as SpyT29 = —3.53(—1) = 3.53. This estimation procedure is valid for any number 
of genotypes and markers. 
Figure 4.3 shows the distribution of the 247 estimated marker scores associated 
with traits PHT and EHT of the maize F, population. Note that the estimated marker 
score values approach normal distribution. 


4.3.2 Estimating the Variance of the Marker Score 


There are many methods of estimating the variance of the marker score associated 
with the ith trait (02); the first one was proposed by Lande and Thompson (1990). 


According to these authors, o? can be estimated аз 
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a 


Scores values linked 
to PHT 


b 


Scores values 
linked to PHT 


Fig. 4.3 Distribution of the marker scores associated with traits (a) plant height and (b) ear height 
of a maize (Zea mays) F> population. Note that the distribution of frequencies of the marker score 
values approaches normal distribution 


pe 
22 _ ПЛА. Oe, 
52 = Ө'М,0; — (4.29) 


Si n 


where Ө; is the estimated vector of regression coefficients of the selected markers, 


2 
M; = -XX; is the covariance matrix M x M of the selected markers that are 
n 
1-н 
statistically linked to the ith trait marker loci; 52 == Y Hy is the unbiased 
i n-M— 


estimated variance of the residuals, H = I — X; (хх) X, Iis an identity matrix 
nxn,Misthe number of selected markers statistically linked to the QTL, and X; is a 
matrix n x M with the coded values of the selected markers. According to Lande and 
Thompson (1990), Eq. (4.29) is an unbiased estimator of б, and its variance can be 
written as 


2 2 
46262 2м (<) 2M? (02) 
Var (6%) = Si êi | : { | 
Si по п? | n?(n — M) 


| (4.30) 


which tends to zero when и, the number of genotypes or individuals, is very high. 
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From Eq. (4.29), it is possible to obtain an estimator of the covariance between 
the ith and jth marker scores when the number of selected markers statistically linked 
to the QTL is the same in the ith and jth traits. Thus, by Eq. (4.29), the covariance 
between the ith and jth marker scores can be estimated as 


5- = 0'M,0; — x5 (4.31) 


where 9; and 8; are the estimated vectors of regression coefficients of the selected 
markers associated with the ith and jth trait loci respectively; Mj; = — Х;Х, is the 
n 


covariance matrix M x M of the markers statistically linked to the ith and jth trait 
marker loci; X; and X; are n x M matrices with the coded values of the selected 
y (I — H;)y;. 
n—-M-1 ü 
the estimated covariance of the residuals between the ith (y;) and jth (y;) trait values, 
Н; =I- X; (хх i) Xi, I is an identity matrix n x n, and M is the number of 
selected markers statistically linked to the QTL. 
According to the PHT values described in Sect. 4.3.1 of this chapter, M = 7, 
п = 247,52 = 180.80 апа 52 = 48.23 (Eq. 4.29). Note that oA <6,» where 


SPHT 


markers associated with the ith and jth trait loci respectively; бе, = 


@ = 83.015 an estimate of the genetic variance of PHT. The estimated portion of 
A A 48.23 
the genetic variance attributable to 62 = 48.23 was @рнт = 3 = 0.5811; that 
SPHT 


is, the seven markers explain 58.11% of the genetic variance associated with PHT. 
Charcosset and Gallais (1996) considered two possible methods of estimating 


o? based on the coefficient of multiple determination or squared multiple 


5i 
correlation Е? (note that in this case R? is not the square of the selection response). 
The coefficient А? gives the portion of the total variation in the phenotypic values 


that is "explained" by, or attributable to, the markers and can be written as 


өх’ E xir ^2 
Pan US em (4.322) 
yy-n? ё? 


where 0X'y — пу? is the overall regression sum of squares adjusted for the intercept 
and y'y — ny? is the total sum of squares adjusted for the mean. The coefficient Ris 
equal to 1 if the fitted equation y; = 60 + У) 0jx; + e; passes through all the data 
jeM 
points, so that all residuals are null; then, the markers explain all the phenotypic 
variance. At the other extreme, R? is zero if у; = 0, апа the estimated regression 
coefficients are null, 1.е., д, 0, ... Bu O. In the latter case, markers do not 
affect the phenotypic observations and the variance of the marker score values is 
zero. Thus, the R? values are between О and 1,1е.,0 < R? < 1.0. Equation (4.32a) is 


M 
useful for estimating о? аѕ 5, 5 R? = 5? , where R; is the estimated value of the jth 
j=l 
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marker and a is the phenotypic variance of the ith trait; however, this is a biased 
estimator of о? (Hospital et al. 1997). 

Charcosset and Gallais (1996) and Hospital et al. (1997) proposed an unbiased 
estimator of с? based оп all the selected markers using the adjusted coefficient of 
multiple determination, 1.е., 

2) _ 85 
R ) B ar (4.32b) 


y 


n— 1 
5-M-1U 


2 
Rag = 1 


whence we can obtain a unbiased estimator of б, а$ 5, 
the markers that affect the phenotypic values. The problem with Eq. (4.325) is that 
the R? dj values could be negative; in that case, the estimated value of б; would also be 


ЕК, d^ oz by jointly using all 


negative. One additional problem with Eq. (4.32b) is that the Ru values can produce 
о; 
22 

z 

Using Eqs. (4.32a) and (4.32b), we can estimate o, but from them it is not clear 
how we can estimate the covariance between two different estimated marker score 
values. 

Consider the case of the PHT values described in Sect. 4.3.1 of this chapter, 
where M = 7, n = 247, and the estimated variance of PHT was Coup = 191.81. The 
estimated values of R? for each of the seven markers were 0.0038, 0.0005, 0.006, 
0.0013, 0.0036, 0.0114, and 0.0298, whence, by multiplying each estimated R? 
value by Сағ = 191.81 апа summing the results, we found that the estimated 


value of в was G2. = 9.78. In this case, the estimated portion of the genetic 
SPHT 


values that are higher than those of the estimated variance of the breeding values 


variance attributable to 2 = 9.78 was Gpyr = 3 = 0.1178; thus, when we 
S PHT 


estimated D according to Eq. (4.322), the seven markers explained only 11.78% of 
the genetic variance associated with PHT. 
The estimated value of Raa for the seven markers jointly was 0.06, whence 62 


SPHT 
= (191.81)(0.06) = 11.50 is an estimate of o? .. . In the latter case, the estimated 
SPHT 


portion of the genetic variance attributable to 8a = 11.50 was 
_ 11.5 : : : 
deur = 33 = 0.1385; that is, according to Eq. (4.32b), the seven markers explain 
13.85% of the genetic variance associated with PHT. 

One additional way of estimating the variance of the marker score o. was 


proposed by Lange and Whittaker (2001) as 


1 n »" К 
> (Si — PAE (4.33) 
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M 
where $; = 5 дух j and ji, is the mean ofS; values. The covariance between the ith 
j=l 

and jth marker scores can be estimated as the cross products of the marker score 
values divided by n — 1. Note that in this case, the number of markers associated 
with the ith and jth traits may be different. 

For the PHT values described in Sect. 4.3.1 of this chapter, where n = 247, the 
estimated value of б, was 9, us — 15.75 and the estimated portion of the genetic 


^ д 15.75 . 
variance attributable to as = 15.75 was 4рнт = 3 = 0.1897. That is, the 


seven markers jointly explain 18.97% of the genetic variance associated with PHT 
according to Eq. (4.33). 


4.3.3 Estimating LMSI Selection Response and Efficiency 


With the estimated phenotypic variances (ane = 191.81), the estimated genetic 


variance C — 83.0) and the estimated marker score variances: 62 = 48.23 
SPHT 


(Eq. 4.29), G = 9.78 (Eq. 4.32a), 62... = 11.50 (Eq. 4.32b), and 22, = 15.75 
S PHT Е : 
(Eq. 4.33), we can estimate the LMSI coefficient, selection response, and efficiency. 


Using the estimated value 52 = 48.23 obtained with Eq. (4.29), it is possible to 
SPHT 


A 52—52 83.0 — 48.23 
timate the LMSI weight mi ВЕН АН. —— — 0.242, 
о weight as Beur = 22 2 — 191.81 — 4823 


SPHT 
whereas for 62 — 9.78, 6° = 11.50, and G2, = 15.75, the estimated values 
SPHT SPHT SPHT 


of Врнт were 0.402, 0.40, and 0.382 respectively. The latter results indicate that the 
estimated values of дьнт associated with the phenotypic values tend to decrease when 
the estimated values of the variance of the marker score increase. This means that at the 
limit, when all the genetic variance is explained by the markers, the estimated values of 
Pur are zero and the estimated LMSI is equal to Ty = $. Thus, for trait PHT, when 
the estimated values of Вьнт are not zero, the estimated LMSI can be written as 
Ties = Spar + dion (PHT; — Spur). The Lina values are used to predict, rank, and 
select the net genetic merit value of each individual candidate for selection. 

Based on the result o — 48.23 obtained with Eq. (4.29) and using a selection 


intensity of 10% (k= 1.755), the estimated LMSI selection response can be 
obtained as 


90 4 Linear Marker and Genome-Wide Selection Indices 


fa =n [EEA 
б; =. 


= ЕЕ — 48.23) + 48.23(191.81 — 83) 


191.81 — 48.23 
= 1.755\/56.65 = 13.21. 
In a similar manner, using the result @ = 15.75, the estimated selection 
spore was Ry = 1.755 83(83 — 15.75) + 15.75(191.81 — 83) 1755/4124 


191.81 — 15.75 
= 11.30. With 22 = 9.78 and б, = 11.50, the estimated values of the LMSI 
selection responses were 10.99 and 11.10 respectively. The latter results indicate that 
the estimated values of the LMSI selection responses tend to increase when the 
estimated values of the variance of the marker score increase. 

We can estimate LMSI versus phenotypic efficiency for one trait as 


a2 
TS =>. А + а T PS O°. 
Ам = 4 |= — 5> where A? is the estimated trait heritability and 4 = = is 
h? 1—@һ б 


the estimated portion of additive genetic variance explained by ће markers. When 


48.2 Ж 
C = 48.23, quus = = 0.5811, and h? = 0.433, the estimated LMSI effi- 


ciency was Ay = У1.58 = 1.25. For 92, = 15.75, 5 = 9.78, and 
SPHT 


5, 
52 


Ospu, = 11.50, the estimated portions of the additive genetic variance explained by 
a 15.75 A : 

the markers were @pyr = 3 = 0.1897, pyr = 3 = 0.1178, апа 

m 11.5 


Чрнт = 3 = 0.1385 respectively, whence the estimated LMSI efficiencies were 


1.1, 1.04, and 1.05 respectively. The latter results indicate that the estimated values 
of LMSI efficiency tend to increase when the estimated values of the variance of the 
marker score increase (Fig. 4.1). 

Figure 4.1 presents the change in LMSI efficiency with respect to phenotypic 
selection for different values of the variance of the marker score when the pheno- 
typic (191.81) and genetic (83) variances are fixed. In a similar manner, Fig. 4.2 
presents the change in the LMSI selection response for different values of the 
variance of the marker score when the phenotypic (191.81) and genetic (83) vari- 
ances are fixed. In effect, LMSI efficiency and the selection response depend on the 
genetic variance explained by the markers. 
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4.3.4 Estimating the Variance of the Marker Score 
in the Multi-Trait Case 


Equation (4.33) can be used in the multi-trait context when the numbers of markers 
associated with the ith and jth traits are different. Also, it is possible to adapt 
Eqs. (4.32a) and (4.32b) to the multi-trait case. However, in the latter case, in 
addition to the markers linked to the QTL that affect one specific trait, we need to 
find markers that affect more than one trait, which may be very difficult. For this 
reason, in the multi-trait context, Eqs. (4.32a) and (4.32b) could be used to estimate 
the variance of the marker score (S) without preselecting the markers that affect the 
phenotypic traits, only when the number of genotypes is higher than the number of 
markers. 


Let y, Yo, ..., y, be r independent multivariate normal vectors of observations, 
Xu Xo c Xu 

each with n observations, such that Y — y : y m с? S is a matrix n x t of 
Ум Ym 077 Ут 


observations for t traits; then, the multivariate linear regression model can be written 
аз У = XB + U, where X is a matrix n x m (m= number of markers and т < n) of 
known coded marker values, B is a matrix m x n of regression coefficients, and U is 
a matrix n x t of unobserved random disturbance whose rows for given X are 
uncorrelated, each with mean 0 and common covariance matrix E (Mardia et al. 
1982; Rencher 2002). According to the least squares method of estimation, B= 


— / 2м 
Орк 5 (¥-BX)'(¥-Bx) 
(ХХ) ХУ is an estimator of В and Е = т 

п-т— 


the residual covariance matrix E assuming that п > т (Johnson and Wichern 2007). 


сул 


is an estimator of 


ее P | . 
Note that 1 — R? = S where e is a vector of estimated residual values of the 
yy 
model у; = 0o + У) 0jxj 4- e; and R? is the coefficient of multiple determination 
jeM 
(Eq. 4.322). In addition, as in the multi-trait context the estimated matrix of residuals 
is Ü = Y — BX, 1 — R? can be written as D = (YY) 'O’U (Mardia et al. 1982), 
whence R? in the multivariate context can written as 


R? =1-р=Р 15, (4.34а) 
whereas Rig (Eq. 4.32b) can be written as 


п—1 ХА 
Rå; =I- ———_D=P'S, 4.34b 
‘Adj n—m-1 ( ) 
where I is an identity matrix t x t, Р! is the inverse of the estimated covariance 
matrix of phenotypic values (P), and S is the estimated covariance matrix of marker 
score values. From Eq. (4.34b), 
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РЕ; = 8 (4.34с) 


is an unbiased estimator of matrix 5, whereas PR? = S (Eq. 4.34a) is a biased 
estimator of matrix 8. The main problem of Eq. (4.34c) is that the diagonal elements 
of S could be negative. 

From the maize Е› population including 247 genotypes (each one with two 
repetitions) and 195 molecular markers described in Sect. 4.3.1, we used two 
traits—PHT (cm) and EHT (cm)—to illustrate the multivariate method of estimating 
the LMSI parameters. The estimated phenotypic and genetic covariance matrices 
were P — p | апа С = | р whereas the estimated 

15.750 0.983 
0.983 28.083 |` 
When we used Eq. (4.34a) and Eq. (4.34c), we obtained estimated values of the 
variance and covariance of the marker scores that were higher than the genetic values 
(data not presented). Equations (4.29) and (4.31) are used later to compare LMSI 
efficiency versus GW-LMSI efficiency using the simulated data described in Chap. 2, 
Sect. 2.8.1. 


With matrices P, C, and S, and the vector of economic weights a^ = |w" 0°], 


covariance matrix of marker scores, using Eq. (4.33), was $ = | 


where w =[—1 —1] and 0'=[0 0], we obtained the estimated matrices T 


= E S and Z = E J , whence the estimated LMSI vector of coefficients 


was В =аЯмТи =[-0.59 —0.18 —0.41 —0.82]. Using a selection inten- 
sity of 10% (k; = 1.755), the estimated LMSI selection response and the expected 


2 ae ~ 'Z 
genetic gains per trait were Ry = kry ВТВ = 20.41 and Ем = к^. = 
В"ТиВ 
[-10.09 —10.31 —2.53 —4.39] respectively, whereas the estimated LMSI 
А ol 
accuracy was gj, = E" = 0.72. 


The estimated LPSI parameters (see Chap. 2 for details) using the phenotypic 
information from the maize F, population for traits PHT and EHT are as follows. 
The estimated LPSI vector of coefficients was b’ = wCP-! = [-0.53 —0.36], 
and, with a selection intensity of 10% (k; = 1.755), the estimated LPSI selection 


response and the expected genetic gains per trait were Ё, =k V b'Pb — 18.97 and 

~ БС 

E'—k;——-—[-10.52 —8.45] respectively, whereas the estimated LPSI accu- 
OT 


РА g 
racy was fy; = = = 0.67. 
OH 
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We can determine LMSI efficiency versus LPSI efficiency to predict the net 


genetic merit using the ratio of estimated accuracy values Phin = 0.72 and рн; 


^ 72, 
= 0.67 of the LMSI and LPSI respectively, i.e., Ay = os = 1.075, whence, 


according to Eq. (4.19), the estimated LMSI efficiency versus the LPSI efficiency, 
in percentage terms, was p, = 100(1.075 — 1) = 7.5. That is, for these data, the 
estimated LMSI efficiency was only 7.5% greater than LPSI efficiency at predicting 
the net genetic merit. 


4.4 Estimating the GW-LMSI Parameters 
in the Asymptotic Context 


Lange and Whittaker (2001) proposed the GW-LMSI. However, these authors did 
not provide detailed procedures for estimating matrices P, C, W, and M. They 
indicated that matrix C can be estimated using the estimated matrix of covariance of 
marker scores (S) and that matrices P, W, and M can be estimated directly by their 
empirical variances and covariances, but this assertion does not indicate a clear 
method for estimating those covariance matrices. In Chap. 2, we described the 
REML method of estimating C and P. Crossa and Cerón-Rojas (2011) described 
matrices W and M in a doubled haploid population. In this study, we describe and 
estimate matrices W and M for an F, population in the asymptotic context according 
to the Wright and Mowers (1994) approach, which is based on regressing phenotype 
values on marker coded values. We used this latter approach to estimate W and M, 
because it is a clearer estimation method than that of Lange and Whittaker (2001); 
however, the Wright and Mowers (1994) approach is an asymptotic method and 
should be regarded with precaution. 

Matrix M is the covariance matrix of the molecular marker code values. All 
marker information used to construct matrix M is presented in Table 4.2. Based on 
this information, we found that the expectations (E(X;) and E(X5)) and the variances 
(У(Х) and V(X5)) of the marker coded values X, and X, are E(X,) = E(X5) = 0 and 
УХ!) = У(Х») = 1, whereas the covariance (Cov(X,, X5)) and correlation (Corr(X,, 
X>)), between X; and X5 were 


Cov(X 1, X2) = Corr(X;, X2) = 1 — 26. (4.35) 

Thus, as the variances of X; and X» are equal to 1, the correlation between Х and 

Xə is Corr(X;, X3) = = 1 — 26, i.e., the covariance and correlation 
1 


between X, and X» are the same. Equation (4.35) results indicate that if we perform 
the same operation with many markers, we will obtain similar results; they also 
indicate that this is the way to construct matrix M. 
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о. Marker a Marker genotype Expected frequency Xi X; 
expected frequency, an ег M 
ы м түү tE, | X2) of АВИА! (1-84 | 1 
the marker genotypes in ап Е АВИА! В» 2(6—6^y4 1 0 
population А!В>/А1В> 52/4 1 -1 
АВАВ, 205—82)/4 0 1 
А В›/А,В; 201—285 + 28°)/4 0 0 
А,В›/АВ› 2(5—52)/4 0 -1 
АВАВ; 87/4 —1 1 
A>B,/A>B> 205—82)/4 —1 0 
A3B»/A3B5 (1—8y/4 —1 -1 


Let X be a matrix of coded markers of size n x m, where n > m and 
m= number of markers; then according to Wright and Mowers (1994), because 
all marker information is contained in matrix XX, when the number of observations 
(n) tends to infinity, the product xix;/n tends to the covariance between markers 
ith and jth, whence matrix п XX should tend to the covariance matrix between 
the markers that conform matrix X with the ijth element equal to (0.5 — ô;;). Thus, 
matrix 2и X X should tend to a covariance matrix where the ijth entry is equal to 
(1 — 26,). Based on the latter result, an estimator of matrix M in the asymptotic 
context is 


М -2n^XX. (4.36) 


Equation (4.36) is an asymptotic result and should be taken with caution. To date, 
there has been no clear method for estimating M in the non-asymptotic context; for 
this reason, Eq. (4.36) is used to estimate the GW-LMSI parameters. 

Assume that a QTL is between the two markers in Table 4.2; then, 6 can be 
written as д = rı + r2 — 2rır2, where гү and rz denote the recombination frequency 
between marker 1 and marker 2 respectively, with the QTL between them. When the 
number of genotypes or individuals tends to infinity, the covariance between the 
phenotypic trait values (у) and the marker | coded values (Х|) in an Е population 
can be written as 


1 
Cov(X1,y) = за (1 - 2л), (4.37) 


where a,(1 — 2r) is the portion of the additive effect (о) of the QTL linked to 
marker 1 (Edwards et al. 1987), and гү is the recombination frequency between the 
QTL and marker 1. We can assume that for many markers, the covariance of the 
phenotypic values is similar to Eq. (4.37), whence matrix W can be obtained. 

Let у be a vector n x 1 of recorded phenotypic values, where и denotes the 
number of observation or records, and X is a matrix of coded markers of size n x т. 
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When п tends to infinity, 2n !Xy tends to be a vector with elements equal to 
a1 — 2rj), where a; is the additive effect of the ith QTL linked to the ith marker, 
and r; is the recombination frequency between the ith QTL and the ith marker. Now 


Yu У2 cc Yu 
let Y — > - » = Уу s be a matrix of observations for f traits; then, an 
Ут Ym ` Ym 


estimator of matrix W in the asymptotic context is 


W = 2и "ХУ. (4.38) 


Once again, Eq. (4.38) is an asymptotic result and should be accepted with 
caution. But to date, there has been no clear method for estimating W in the 
non-asymptotic context; for this reason, Eq. (4.38) is used to estimate the 
GW-LMSI parameters. 


4.5 Comparing LMSI Versus LPSI and GW-LMSI 
Efficiency 


To compare LMSI efficiency versus GW-LMSI efficiency for predicting the net 
genetic merit, we use the simulated data set described in Chap. 2, Sect. 2.8.1. 


Гом 
Figure 4.4 presents the estimated accuracy values of the LPSI (jj = =”), the 
OH 


ES 


o~ C~ 

LMSI (Фиг, = 25 and the GW-LMSI (иг, = 22 for five simulated selection 
cycles. In addition, Table 4.3 presents the estimated LPSI, LMSI, and GW-LMSI 
selection responses, the estimated PL LMSI, and GW-LMSI variances of the 
predicted error ((1— Paz On (1— Pain jo? and (1— Pai, Jon respectively), the 
ratios of the estimated LMSI accuracy to the estimated LPSI accuracy and the 
estimated LMSI accuracy to the estimated GW-LMSI accuracy, expressed as per- 
centages (Eq. 4.19), for five simulated selection cycles. 

According to Fig. 4.4, for this data set the estimated LMSI accuracy (p нім) Was 
higher than the estimated LPSI and GW-LMSI accuracy Фи; and Ён} respectively), 
for the five simulated selection cycles, that is, Puiu > Pui > Ён} у- In a similar 
manner, Table 4.3 results indicate that the estimated LMSI selection response (Ra м) 
was higher than the estimated LPSI and GW-LMSI selection responses (R тапа Rw 
respectively): Bu > R; > Ry. 

Note that the estimated LPSI, LMSI, and GW-LMSI variances of the predicted 
error, and the estimated LMSI efficiency versus LPSI efficiency and versus 
GW-LMSI efficiency (expressed in percentages) are related to the estimated 
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Accuracy values 


3 


Cycle 


LPSI GW-LMSI 


Fig. 4.4 Estimated correlation values of the linear phenotypic selection index (LPSI), the linear 
molecular selection index (LMSD, and the genome-wide LMSI (GW-LMSI) with the net genetic 
merit for four traits, 2500 markers and 500 genotypes (each with four repetitions) in one environ- 
ment for five simulated selection cycles 


Table 4.3 Estimated linear phenotypic, molecular, and genome-wide selection indices (LPSI, 
LMSI, and GW-LMSI respectively), selection responses and variance of the predicted error, and 
estimated ratio of LMSI accuracy to LPSI and GW-LMSI accuracy expressed in percentages for 
4 traits, 2500 markers and 500 genotypes (each with four repetitions) in one environment for five 
simulated selection cycles 


Variance of the predicted 


Efficiency of LMSI 


Selection response error versus 
Cycle LPSI | LMSI | GW-LMSI |LPSI |LMSI | GW-LMSI | LPSI GW-LMSI 
1 17.84 | 19.60 | 16.24 22.53 | 0.07 39.84 10.07 | 20.67 
2 15.66 | 24.36 | 13.88 22.66 | 0.07 40.06 12.14 | 26.81 
3 14.44 | 14.70 | 12.13 21.95 | 1.86 39.86 3.43 | 21.27 
4 14.29 | 15.29 | 12.48 22.84 | 1.46 39.09 6.57  |22.50 
5 13.86 | 15.15 | 11.49 22.13 | 0.88 39.65 11.11 | 31.88 
Average | 15.22 | 17.82 | 13.24 22.42 | 0.87 39.70 8.66 | 24.63 
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LMSI, LPSI, and GW-LMSI accuracies, and that in all five selection cycles, 
Puiu > Pui > Puty This implies that the estimated LMSI variance of the predicted 


error was lower than the estimated LPSI and GW-LMSI variance of the predicted 
error. In a similar manner, because jj, > Py; > Pyt, the estimated LMSI 


efficiency was higher than the estimated LPSI efficiency and the estimated 
GW-LMSI efficiency. 

Based on Fig. 4.4 and Table 4.3 results, we conclude that the LMSI was a better 
predictor of the net genetic merit than the LPSI, and that the LPSI is a better predictor 
of the net genetic merit than the GW-LMSI for this simulated data set. 
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Chapter 5 A) 
Linear Genomic Selection Indices iim 


Abstract The linear genomic selection index (LGSI) is a linear combination of 
genomic estimated breeding values (GEB Vs) used to predict the individual net genetic 
merit and select individual candidates from a nonphenotyped testing population as 
parents of the next selection cycle. In the LGSI, phenotypic and marker data from the 
training population are fitted into a statistical model to estimate all individual available 
genome marker effects; these estimates can then be used in subsequent selection 
cycles to obtain GEBVs that are predictors of breeding values in a testing population 
for which there is only marker information. The GEBVs are obtained by multiplying 
the estimated marker effects in the training population by the coded marker values 
obtained in the testing population in each selection cycle. Applying the LGSI in plant 
or animal breeding requires the candidates to be genotyped for selection to obtain the 
СЕВУ, and predicting and ranking the net genetic merit of the candidates for selection 
using the LGSI. We describe the LGSI and show that it is a direct application of the 
linear phenotypic selection index theory in the genomic selection context; next, we 
present the combined LGSI (CLGSD, which uses phenotypic and GEBV information 
jointly to predict the net genetic merit. The CLGSI can be used only in training 
populations when there are phenotypic and maker information, whereas the LGSI is 
used in testing populations where there is only marker information. We validate the 
theoretical results of the LGSI and CLGSI using real and simulated data. 


5.1 The Linear Genomic Selection Index 


5.1.1 Basic Conditions for Constructing the LGSI 


Conditions described in Chap. 4 (Sect. 4.1.1) for constructing a valid linear molec- 
ular selection index (LMSI), are also necessary for the linear genomic selection 
index (LGSI); however, in addition to those conditions, the LGSI also requires: 


1. All marker effects to be estimated simultaneously in the training population. 

2. The estimated marker effects to be used in subsequent selection cycles to obtain 
GEBVs that are predictors of the individual breeding values in the testing 
population (candidates for selection) for which there is only marker information. 


© The Author(s) 2018 99 
J. J. Céron-Rojas, J. Crossa, Linear Selection Indices in Modern Plant Breeding, 
https://doi.org/10.1007/978-3-3 19-91223-3_5 


100 5 Linear Genomic Selection Indices 


3. The GEBV values to be composed entirely of the additive genetic effects. 

4. Phenotypes to be used to estimate all marker effects in the training population, not 
to make selections in the testing population (Heffner et al. 2009; Lorenz et al. 
2011). 


5.1.2 Genomic Breeding Values and Marker Effects 


The breeding value (g;) is the average additive effects of the genes an individual 
receives from both parents; thus, it is a function of the genes transmitted from parents 
to progeny and is the only component that can be selected and, therefore, the main 
component of interest in breeding programs (Mrode 2005). The ith phenotypic value 
(yj) can be denoted as у; = g; + ej, where в; is the breeding value and e; the residual. 
Basic assumptions for g; and e; are: both g; and e; have normal distribution with 


expectation equal to zero and variance б апа б? respectively. This means that 


Yi = Hi + g; + ejiis а linear mixed model (Mrode 2005; Searle et al. 2006), where y; is 
the mean of y;. 

Let y; = [уд yo +++ yi] be a vector 1 x n of observations in the ith trait and let 
2, = [ал 82 `` gn] be a vector 1 х n of unobservable breeding values 
associated with y;; then y; can be written as 


y; = lu; + Zg; + ei, (5.1) 


where y; is the mean of the ith trait, 1 is a vector n x 1 of 1s, Z is a design matrix of 
Os and 1s, g; - MVN (0, Ao; ) is a vector of breeding values, and e; - MVN (0, 1,02) 
is a vector of residuals; 0 is the mean and Ao, and Lo? the covariance matrix of g; 
and e; respectively; A is the numerical relationship matrix (Mrode 2005) and L, an 
identity matrix n x n; o. and о аге the additive and residual variances associated 
with g; and e;; and MVN stands for multivariate normal distribution. 

Suppose that A, Z, и a, and о? are known; then, according to Mrode (2005), the 
best linear unbiased predictor (BLUP) of g; can be written as 


8: = o? AZ'V- (y; — 14), (5.2) 


where V^! is the inverse matrix of the variance of y, Le. 
Var(y;) — o? ZAZ! + 1,02 = V. In the context of animal breeding, Eq. (5.2) is 
considered a univariate linear phenotypic selection index (LPSI) (Mrode 2005) 
and is used to rank and select individuals as parents of the next generation in the 
context of one trait. Equation (5.2) can be extended to the multi-trait phenotypic 
selection index case, but to predict the net genetic merit (H — w'g, see Chap. 2 for 
details) it would be necessary to construct linear combinations of the predicted 
values of g; associated with the traits of interest as was described in the Foreword 
of this book. 
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The vector of the individual genomic breeding values (y;) associated with the ith 
characteristic (1 = 1, 2,...,t; t = number of traits) of the candidates for selection can 
be written as 


у; = Хи,, (5.3) 


where X is an n x т matrix (л = number of observations and т = number of 
markers in the population) of coded marker values (2 — 2p, 1 — 2p, and —2p for 
genotypes AA, Aa, and aa respectively) associated with the additive effects of the 
quantitative trait loci (QTL) and и; is an т x 1 vector of the additive effects of the 
QTL associated with markers that affect the ith trait. It is assumed that y; has MVN 
with mean 0 and variance Соў, ie. у; ~ МУМ (0, Go ), where o; is the 
additive genomic variance of y; and G = ХХ'/с is the n x n additive genomic 
relationship matrix between genotypes; c — 5 2p j0 —p j) in an E? population, 


j=l 
m 


and c = 5 4p Ae —p 3) in a double haploid population; р is the frequency of allele 
j=l 
A and 1 — p is the frequency of allele a in the jth marker (j = 1, 2,..., m). 
The additive genomic relationship matrix G = XX'/c has special properties. 
For example, in the asymptotic context, the expectation of matrix G is equal to the 
numerical relationship matrix A, i.e., E(G) = А (Habier et al. 2007; Van Raden 2008); 
this means that G is a particular realization of A and when the number of markers and 
genotypes increases in the training population, the value of G tends to concentrate 
around A. Thus, it can be assumed that at the limit, when the number of markers and 
genotypes is very high, С = A (Cerón-Rojas and Sahagün-Castellanos 2016). 
The vector of genomic breeding values (Eq. 5.3) has a similar function in 
genomic selection as g; in the phenotypic selection context. In addition, g; can be 
written as g; = Y; + И, where m; = g; — у; (Gianola et al. 2003). Also, note that 


Cov(g;,y;) = бт, (5.4) 


i.e., the covariance between ү; and g; is equal to the variance of y; (Dekkers 2007). 

Lety; = [уд Ур > Yin |beavector 1 x n of observation of the ith trait in the 
training population and let ү; = |y; ур `` Уш] be a vector 1 x n of 
unobservable genomic breeding values associated with y;; then, y; can also be 
written as 


y; = lu + ZY; + £i, (5.5) 


where и; is the mean of the ith trait, 1 is a vector n x 1 of 1s, Z is a design matrix, 
у: - MVN (0, Go?) and =; ~ MVN (0, | are vectors of genomic breeding values 
and of residuals respectively, and о? is the residual variance. L,, С, and o; were 
defined in Eqs. (5.2) and (5.3). 

According to Eqs. (5.2) and (5.3), when yj, o; and о? are known, the vector of 
GEB Vs for the individuals with the ith trait can be obtained as 
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7: = o; СУ (y; — 1и), (5.6) 


where the variance of у; should now be written as У = оу ZGZ + Lo}. In the 
context of genomic selection, Eq. (5.6) is considered a univariate LGSI and is used to 
rank and select individuals as parents of the next generation (Van Raden 2008; 
Togashi et al. 2011). Equation (5.6) is the BLUP of y; and can be extended to a multi- 
trait genomic selection index, but to predict the net genetic merit (H = w’g), it is 
necessary to construct an LGSI, which is a linear combination of y;. 

Although Eq. (5.6) is theoretically very important in LGSI, in practice we need to 
estimate the marker effects associated with all the traits of interest and to use these 
estimates in the testing population to obtain the GEBV of the candidates for 
selection. Let u = [u] uj ++- ul] bea vector 1 x nt associated with / traits. 
In the univariate context, Van Raden (2008) showed that the ith vector u; of marker 
effects in the training population can be estimated as 


a; = c !X'[G + vL] (y; — 1ш), (5.7) 
о? 
where v = — i о? : o? and the other parameters were defined earlier. According to 
РЕ je 
Si 
Ceron-Rojas et al. (2015), to estimate the vector и = [uj u, --- ш] in the 


multi-trait context, Eq. (5.7) can be written as 


й = с М [(1, ® С) + (МЧ ®1„)] (у—н® 1), (5.8) 


where W, = I, & X, “©” denotes the Kronecker product (Schott 2005), c and X were 
defined in Eq. (5.3); М = RC” ', whereR and C are the residual and breeding value 


covariance matrices for ¢ traits respectively; у’ = [ут ys + yi] ~ MVN(p, У) 
is a vector of size 1 x tn, with covariance matrix У = C & G+R@1I,; 1 is an 
identity matrix of size t x t and L, was defined earlier; и’ = [иу р ++ yw, J isa 


vector 1 x f of means associated with vector y, and 1 is a vector n x 1 of 1s. In this 
case, the estimator of the vector of sub-vectors of genomic breeding values у’ = 


[ү Y2 ... w,]inthe testing population can be obtained as 
y= Ум. (5.9) 


Equation (5.9) is the vector of GEBVs for the multi-trait case. Thus, in the testing 
population, in Eq. (5.9), only the coded values in matrix X change, whereas w is the 
same in each selection cycle. Note that to obtain Eqs. (5.7) and (5.8), we assumed 
that р, C, and В are known. 

We indicated that the genomic breeding values have normal distribution 
(Eq. 5.5). Using the simulated data described in Chap. 2, Sect. 2.8.1, in Fig. 5.1 
we present the distribution of the GEBVs (Eq. 5.9) associated with traits T1 in the 
first (Fig. 5.1a) and the fifth (Fig. 5.1b) selection cycles in the testing population. In 
effect, the frequency distribution of the GEBVs approaches normal distribution in 
both selection cycles. 
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a 


Distribution of 
GEBV values 


b 


Distribution of 
GEBV values 


Fig. 5.1 Distribution of the genomic estimated breeding values (GEBVs) associated with traits ТІ 
in (a) the first and (b) the fifth selection cycles in the testing population 


5.1.3 The LGSI and Its Parameters 


Similar to the LPSI (Chap. 2), the objective of the LGSI is to predict the net genetic 
merit H = w'g, where g' = [g; g2 ... &,| @ = number of traits) is a vector of 
unobservable true breeding values and w' = [wi w2 ... и; | is a vector of 
economic weights. Suppose that the genomic breeding values y; = Xu; are known; 
then, the LGSI can be written as 


Ic = pr, (5.10) 


where f) is an unknown vector of weights. 

The main advantage of the LGSI over the LPSI lies in the possibility of reducing 
the intervals between selection cycles (LG) by more than two thirds (Lorenz et al. 
2011); thus, this parameter should be incorporated into the LGSI selection response 
and the expected genetic gain per trait to reflect the main advantage of the LGSI 
over the LPSI and the other indices. Assuming that Lg = 1, in the LPSI context we 
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wrote the selection response as R; = Аюнрнг; however, if Lg Æ 1, the LGSI selection 
response can be written as 


_ kr omg _ М 


T7 бнрн, (5.11) 


К = = 
Lg 0j. Lg 


G 


where Ку is the standardized selection differential (or selection intensity) associated 
with the LGSI, сни, is the covariance between Н = w'g and ће LGSI, сї, is the 
variance of the LGSI, og is the standard deviation of H, pyr, is the correlation 
between Н and the LGSI, and Lg denotes the intervals between selection cycles. 

Let С and Г be matrices of covariance of the breeding values (g) and of the 
genomic breeding values (y) respectively; then, the correlation between Н = w'g and 
Ig = fy can be written as 


WEB (5.12) 
Pig = = umm 
NT w'CwA/pTrp 
where ГВ = opr is the covariance between Н = w'g and Ig = fy. он = Vw Cw 
is the standard deviation of the variance of H = w'g, and cr, = ү ВТВ is the 
standard deviation of the variance of Ig = Ву. 


5.1.4 Maximizing LGSI Parameters 


To maximize the genomic selection response (Eq. 5.11), suppose that kr, он and Lg 
are fixed and take the derivative of the natural logarithm (In) of the correlation 
between Н and /с (Eq. 5.12) with respect to vector В, equate the result of the 
derivative to the null vector, and isolate f, i.e., 


д _ 9 wIf Е 
ip In px, = op In ea =0. (5.13) 


The result is В = sw, where s = f/Tf/w'T is a proportional constant that does 
not affect the maximum value of руу, because this is invariant to the scale change; 
then, assuming that В = w, the maximized LGSI selection response can be written as 


k 
Ri, = VwTw. (5.14) 


E 


Hereafter, we refer to the LGSI genomic selection response as that of Eq. (5.14). 
Also, because f) — w, Eq. (5.12) can be written as 
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PH = SF >? (5.15) 


which is the maximized correlation between H = w'g and Ig = В'ү, or LGSI 
accuracy; он = v w'Cw is the standard deviation of the variance of Н, and о, = 
v BT is the standard deviation of the variance of Ic. 

The LGSI expected genetic gain per trait (Еу) can be written as 


_ ky Гу 
^^ LeywTIw 


All the terms in Eq. (5.16) were previously defined. 
Let åg = 5 be LGSI efficiency versus LPSI efficiency to predict the net genetic 


Е, (5.16) 


merit, where руу, is ће LGSI accuracy and рн; the LPSI accuracy; т percentage 
terms, LGSI efficiency versus LPSI efficiency for each selection cycle can be written 
as 


Po = 100(4с — 1). (5.17) 


According to Eq. (5.17), if pg > 0, LGSI efficiency is greater than LPSI 
efficiency; if pg = 0, the efficiency of both selection indices is equal, and if 
Pc < 0, the LPSI is more efficient than the LGSI at predicting Н = w'g. 

Equation (5.17) is useful for measuring LGSI efficiency in terms of accuracy 
when predicting the net genetic merit (Н = w'g), whereas the Technow et al. (2013) 
inequality measures LGSI efficiency in terms of the time needed to complete one 
selection cycle. In the context of the LGSI and the LPSI, the Technow inequality can 
be written as 


Lg «Шо ть, (5.18) 
hy 


where Lg and Lp denote the time required to complete one selection cycle for the 
LGSI and the LPSI respectively, ору, is the LGSI accuracy, and hy is the square root 
of the heritability (Lin and Allaire 1977; Nordskog 1978) of the LPSI, which can be 


denoted as h; — 4 [bce (see Chap. 2 for details). Then, assuming that the selection 


intensity is the same for both selection indices, if Eq. (5.18) is true, the LGSI is more 
efficient than the LPSI per unit of time. 


5.1.5 Relationship Between the LGSI and LPSI Selection 
Responses 


To obtain the relationship between Rz, and А; in the asymptotic context, we omitted 
the intervals between selection cycles (Lg and L, respectively) to simplify the 
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algebra. Consider a population where the number of genotypes and markers tends to 
infinity; in this case, markers explain most of the true additive genetic variances and 
covariances. Thus, we can assume that matrices Г and С are very similar, and at the 
limit, Г = C. Now suppose that in this population the phenotypic уапапсе-соуап- 
ance matrix (P) is known and comprises matrix I' and the variance-covariance 
residual matrix (В). In this case, the inverse of P can be written as P! = (Г + R) ! = 
pp (Г ES 5 Г! where T! and R^ are the inverses of matrices Г and 
R respectively. Thus, the LPSI selection response is given by 


R; = kıVb' Pb = kj V wTP-'Tw = күзгү = м (T! +R) 'w, (5.19) 


where b = P^ !Гуү is the vector of coefficients of the LPSI in the asymptotic context. 
Note that b'Pb > 0 and wI'w > 0, i.e., b'Pb and wTw are positive semi-definite, 
meaning that wTw > w (p + в) > 0; then, in the asymptotic context, 
Ri, > Ry. This result is not common when the number of genotypes and markers is 
small; however, it gives an idea of the theoretical behavior of Ах; with respect to Ry 
when the number of markers and genotypes is very large. 

Because g, can be written as g, = Y4 + Па, where ng —g; — Ya (4 = 1, 2, ::-, 0, 
for low numbers of markers and genotypes, the covariance genotypic matrix C can 
be written as С = Г + Е, where Е = С — Г; then, the inverse of matrix Р can be 
written as Р! = [Œ + E) + R]! = Œ + Е) — C +E +В +в |! 
T+ E) '. In the latter case, the LPSI selection response К; can be written as 


Ry = йүз (T + E)P- (E + E)w 


—1 
= kn wrw-e wEw — w [ar Еу! +В !| w. (5.20) 


Equation (5.20) indicates that in the non-asymptotic context (low numbers of 
markers and genotypes), Ат, and Rz are related in three possible ways: 


1. Ri > Ry if WEw > w[T +E) + Е |] 'w 
2. R = Ri, if wEw = w[ + Е) + Ву 
3. К > Rr if wEw < w[ +E)  -&R !] !w 
The second and third points indicate that Ё may be equal to or larger than R;, 
even under a small number of markers, depending on the size of w'Ew and w' 


Г + E)! + в]. These three points explain the theoretical relationship 
between А; апа Rz, for a low number of markers and genotypes. When Г = С, 


Е = 0, and R; = күзгө — w (T! +R!) у, then Ах, 2 Rr. 
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5.1.6 Statistical LGSI Properties 


Assuming that Н and Ig have joint bivariate normal distribution and that Г, C, and 
w are known, the LGSI has the following properties: 


1. The variance of Ig Cm) and the covariance between Н and Ig (он) are equal, i.e., 
2 
о = ОН. 

2. The maximized correlation between Н and Ig (or LGSI accuracy) is equal to 
Puig = та, where o;, is the standard deviation of от, and он is the standard 
deviation of the variance of H (02. 

3. The variance of the predicted error, Var(H — Ig) = (1 m рі), is minimal. 
Note that Var(H — Ig) = ar. + Or — 20ні,, and when В = w, бт, = бн» 
whence Var(H — Ig) = от — от, = (1 = Pin.) 6a is minimal. 

4. The total variance of H explained by J, is от, = are It is evident that if 
Puig = 1, от, = 82. and if py, = 0, от, = 0. That is, the variance of H explained 
by Ig is proportional to ру, and when руу, is close to 1, Br. is close to о, рн 


is close to 0, от, is close to 0. 


The LGSI properties described in points 1—4 of this subsection are the same аз the 
LPSI properties described in Chap. 2. This corroborates the LGSI as an application 
of the LPSI theory to the genomic selection context. 


5.1.7 Genomic Covariance Matrix in the Training 
and Testing Population 


To derive the LGSI theory, we assumed that the true genomic additive variance— 
covariance matrix Г was known. However, in practice, we need to estimate it. In the 
training population, matrix Г can be estimated by restricted maximum likelihood 
(REML) using phenotypic and genomic information, as described by Vattikuti et al. 
(2012) and Su et al. (2012). In Eqs. (2.22) to (2.24) of Chap. 2, we presented the 
formulas for estimating the genotypic and residual variance and covariance based on 
the formulas described by Lynch and Walsh (1998). Here, we present a brief 
description of how we can estimate the qth component (су) of Г in the training 
population using the REML method. 

We estimated 6,4, = б (q, а = t = number of traits) in the absence of 
dominance and epistatic effects, using the model у, = lu, + Zy, + ё, where the 
vector y, ~ NMV (144, Vj) g x 1 (в = number of genotypes in the population) had a 
multivariate normal distribution; 1 was a g x 1 vector of 1s, и, was the mean of the 
qth trait, Z was an identity matrix g x 8; Ya ~ NMV(0, Go; 4) Was a vector of genomic 
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breeding values, and e, ~ NMV(0, lo; ) was a g X | vector of residuals. Matrix 


N 
С = XX'/c was the genomic relationship matrix, and in an F, population, c = 5 2p;q; 
j=l 

X was a g x т matrix (т = number of markers) of the coded marker values (2 — 2p 
for AA, 1 — 2p for Aa, and —2p for aa) for the additive effects of the markers; р and 
4 denote the frequency of allele A and the frequency of allele a in the jth marker 
G = 1, 2,..., m), and У„ = боз, +102. 

The expectation—maximization algorithm allowed the КЕМІ, for the variance 


components o; , and o; to be computed by iterating the following equations: 


(oy 
0201) = 62% + m I», (1? GTO )у, <tr (TG) (5.21) 
and 
(02) 
gin — 009 + m |y, (rry, - (т), (5.22) 


where g is the number of genotypes. After п iterations, when gil) 


("+1) was very similar to б”, от) ш 


was very similar 


to 02% and o? and o? were the estimated 


q q 
variance components of 6; à and бу respectively. ш Eqs. (5.21) and (5.22) tr(.) denoted 


-1 
the trace of the matrices within brackets; T — Ў. = v; '1(1v;'1) TV. and 
bon was the inverse of V, — Соз, + lo; . In matrix T”, У.) was the inverse of 
matrix у) = Go,” + 102". 

The genomic additive genetic covariance between the observations of ће gth and 
ith traits, y, and y; (оу, q, i = 1, 2,...,0), сап be estimated by КЕМІ. Here, we 
adapted Eqs. (5.21) and (5.22) using the variance of the sum of y, and y; i.e., Var 
(у; + Ya) = Vit V, + 2C; where V; = Соз + Io = Var(y;) is the variance of y; 
and V, — Gor, + lo; = Var(y,) is the variance of y,; 2С = 260, + 210., 
= 2Cov(y;, Yq) is the covariance of y, and y;, and oy, and о, are the genomic and 


residual covariance respectively, associated with у; and y,. Thus, one way of 
estimating oy, and ое, is by using the following equation: 


0.5Var(y; + y,) — 0.5Var(y;) — 0.5Var(y,), (5.23) 


for which Eqs. (5.21) and (5.22) can be adapted. 

If there is only marker information on the testing population, then it is not 
possible to estimate Г using Eqs. (5.21) to (5.23). Another way of estimating Г is 
to use the method proposed by Ceron-Rojas et al. (2015), which requires the 
estimated values of y4 Ta) in the cycle of interest. Let ü be the estimator of the 
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vector of marker effects ш = [u] uj -+++ ш] fort traits obtained in the training 
population. We obtained the qth GEBVs (q = 1, 2,..., f) in the Аһ selection 
cycle ( = 1,2,..., number of cycles) as 

Yq = Xi, (5.24) 


where u, is the vector of size т x 1 of the estimated marker effects of the qth trait in 
the training population and X, is a matrix of size n x m of the coded values of marker 
genotypes in the /th selection cycle of the testing population. 

Now suppose that ү, and yy have multivariate normal distribution jointly, 
with mean li, and lj, , respectively, and covariance matrix Соу» where 1 is 
an n x 1 vector of 1s and G = XX"/c is the additive genomic relationship matrix. 
Then, Г = 1 сап be estimated as 


Г, = linh (5.25) 


PS ША A М че A а. А : 
where о, = — (тн — 18,1) G; (Yo — lj.) is the estimated covariance between 


Та and yy in the Аһ selection cycle of the testing population; g is the number of 
genotypes; Tu was defined in Eq. (5.24); п, апа @ are the estimated arithmetic 
means of the values ШЕ апа Tor lisag x 1 vector of 1s and G; — c !XiX, is the 
additive genomic relationship matrix in the АН selection cycle (1 = 1, 2, ..., number 
of cycles) in the testing population. 

From Eq. (5.25) we can estimate the LGSI response and expected genetic gain per 
trait in the testing population as 

ky Гу 


A k a os 
Rig = VwIw and E, =- , (5.26) 
Lg Lg 


respectively. The estimated LGSI (Т6) values in the /th selection cycle can be 
obtained as 


t 
Ia = Уи, (5.27) 
4=1 


where w; is the qth economic weight and Ta was defined in Eq. (5.24). Equation 


(5.27) is a vector of size g x 1 (g= number of genotypes). In practice, Гс values аге 
ranked to select individual genotypes with optimal GEBVs. 
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5.1.8 Numerical Examples 


To estimate matrices C and R and the marker effects in the training population, 
we used a real maize (Zea mays) Fə population with 248 genotypes (each with 
two repetitions), 233 molecular markers, and three traits—grain yield (GY, ton 
Ва '), ear height (EHT, cm), and plant height (PHT, cm)—evaluated in one 
0.07 0.61 1.06 
environment. The estimated matrices were C = | 0.61 17.93 22.75 | and 
1.06 22.75 44.53 
" 0.38 0.72 1.27 
В = | 0.72 47.14 60.96 |, which were estimated by Eqs. (5.21) to (5.23) 
1.27 60.96 121.46 
using the numerical relationship A instead of the genomic relationship matrix 
(С = ХХ//с). 
Table 5.1 presents the first 20 BLUPs of the estimated marker effects (Eq. 5.8) 
in the training population and the first 20 marker coded values and GEBVs 
(Eq. 5.9) obtained in the testing population associated with trait GY. In the 


Table 5.1 The 20 best linear unbiased predictors (BLUPs) of the estimated marker effects in the 
training population and the first 20 marker coded values and genomic estimated breeding values 
(GEBVs) obtained in the testing population associated with grain yield 


Training population Testing population 
Marker coded values 
BLUPs М1 М2 M3 a M233 GEBVs 
—0.0003 1 1 0 —1 0.195 
—0.0038 0 0 0 =1 0.221 
—0.0085 -1 1 0 = —0.643 
0.0069 0 1 0 1 0.525 
—0.0042 0 0 0 0 —0.603 
0.0038 -1 0 0 0 0.062 
0.0008 0 1 1 0 —0.226 
0.0012 0 1 1 1 0.023 
—0.0004 0 =1 0 0 0.444 
0.0062 0 0 1 =1 —0.286 
0.0121 -1 1 0 1 —0.196 
0.0077 -1 =f -1 0 —0.566 
0.0033 -1 0 0 0 0.073 
0.0102 -1 1 0 1 0.058 
0.0054 0 1 0 0 0.874 
0.0002 0 0 0 0 0.102 
0.0171 0 1 0 -1 —0.342 
0.0159 -1 0 1 -1 —0.428 
0.0117 -1 0 0 =1 0.072 
0.0121 0 -1 0 = —0.428 
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testing population, there were 380 genotypes and 233 molecular markers. In this 
population, the estimated genomic covariance matrix Г = Мы was 


0.21 2.95 5.00 
T= | 2.95 4241 71.11 |. The first GEBV (0.195) related to GY in 
5.00 71.11 121.53 
Table 5.1 was obtained as 0.195 = — 0.0003(1) — 0.0038(1) — 0.0085 
(0) +... — 0.03(—1). The other GEBVs can be obtained in a similar manner. 
Suppose a selection intensity of 10% (k; = 1.755) and a vector of economic weights of 
w = [5 —0.1 —0.1]; then, the estimated LGSI selection response and the expected 


genetic gain per trait without including the interval between selection cycle is R le= 

УГ 
V ww 
respectively, whereas the estimated LGSI accuracy was Pyr, = 0.48. 

Chapter 11 presents RIndSel, a graphical unit interface that uses selection index 
theory to select individual candidates as parents for the next selection cycle, which 
can be used to obtain the results of the real numerical example described in this 
subsection. 

To compare LGSI efficiency versus LPSI efficiency we used the simulated data 
described in Chap. 2, Sect. 2.8.1. According to Beyene et al. (2015), at least 4 years 
are required to complete one phenotypic selection cycle in maize, whereas genomic 
selection requires only 1.5 years. Thus, to compare LGSI efficiency versus LPSI 
efficiency in terms of time, we can use the Technow et al. (2013) inequality 
described in Eq. (5.18). 

Table 5.2 presents the estimated value of Eq. (5.18) for five simulated selection 
cycles. The LGSI efficiency was higher than LPSI efficiency in terms of time, 
because the Technow et al. (2013) inequality was true in the five selection cycles. 
An additional result obtained by Ceron-Rojas et al. (2015) is presented in Fig. 5.2, 
which shows the correlation among the LGSI, the LPSI, and the true net genetic 


(1.755) М "Гм = 0.92 and E, = (1.755) = [0.80 11.41 19.28] 


Table 5.2 Five simulated 
selection cycles 


Time required for the linear genomic selection index (Lg) and 
linear phenotypic selection index (Lp) to complete one selection 
cycle; estimated accuracy (рш) of the linear genomic selection 


index and the square root of the estimated heritability of the 
linear phenotypic selection index (hy); estimated right-hand side 


н Lp) of the inequality formula (Lg < %2 Lp) 
hy hy 
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Fig. 5.2 Correlation between the linear genomic selection index (LGSD, the linear phenotypic 
selection index (LPSI), and true net genetic merit (Н) values in seven selection cycles. For each 
selection cycle, the first column indicates the correlation between the LGSI estimated values and the 
H true values, whereas the second column shows the correlation between the LPSI estimated values 
and the H true values 


merit values in seven selection cycles. According to Fig. 5.2, the correlation between 
the LGSI and the true net genetic merit values was higher than the correlation 
between the LPSI and the true net genetic merit values for the first three selection 
cycles; after this cycle, the correlation between LGSI and the true net genetic merit 
values tended to decrease. 


5.2 Тһе Combined Linear Genomic Selection Index 


The combined LGSI (CLGST) developed by Dekkers (2007) is a slightly modified 
version of the LMSI (see Chap. 4 for details), which, instead of using the marker 
scores, uses the GEBVs and the phenotypic information jointly to predict the net 
genetic merit. The main difference between the CLGSI and the LGSI is that the 
CLGSI can only be used in training populations, whereas the LGSI is used in testing 
populations. The basic conditions for constructing a valid CLGSI include conditions 
for constructing the LPSI, the LMSI, and the LGSI, because the CLGSI uses GEBVs 
and phenotypic information jointly to predict the net genetic merit. 
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5.2.1 The CLGSI Parameters 


The net genetic merit can be written in a similar manner to that in the LMSI context, 
that is, as 


H=wg+wyy=([w wj Н = alc, (5.28) 
where g' = [ор ... g] is the vector of breeding values, w = [и --- wi] 
is the vector of economic weights associated with breeding values, 
у, =[0; -.. 0,] isa null vector associated with the vector of genomic breeding 
values y = [ур yo; ... у], ас = [у wilandzg=[g’ vy]. 


The CLGSI can be written as 


1с = Вуу + Boy = (Ву Вс] H = Bitte, (5.29) 
where у’ = [у ++- у; | (= number of traits) is the vector of phenotypic values; y 


was defined earlier; В, апа Вс are vectors of coefficients of phenotypic and genomic 
weight values respectively; Вс = (В; Po] and tg = [у v]. 
The CLGSI selection response can be written as 


ас cfc 
Vac P cac VAT 


where Ку is the standardized selection differential of the CLGSI, o, = ас сас апа 
Var(Ic) = ВсТсВс are the variances of Н and Ic, whereas a ¥cB and pyr are the 


covariance and the correlation between H and [с respectively; Тс = Var У |= 


Кс = Күснрніс = Кон (5.30) 


гг гг 
covariance matrix, Р = Var(y), the genomic covariance matrix, Г = Var(y), and 
the genetic breeding values covariance matrix, C = Var(g). 
Suppose that matrices Vc and Тс are known; then the CLGSI vector of coeffi- 
cients that simultaneously maximizes руу. and Rc can be written as 


В 1l and Wc = LH = E J are block matrices of the phenotypic 


Вс = Tc Час, (5.31) 
whence the optimized CLGSI is 
Ic = Bote, (5.32) 


Equations (5.31) and (5.32) indicate that the CLGSI is an application of the LPSI 
to the genomic selection context. 
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From Eq. (5.31), the maximized CLGSI selection response, expected genetic gain 
per trait and accuracy can be written as 


Кс = Ю\/ВсТсВс, (5.33) 
V 

Ec =k _ er с (5.34) 

уВсТсВс 
апа 

ВсТсВ 

Рніс = == СС, (5.35) 
w'Cw 


respectively. Note that the maximized LPSI accuracy is pg; = Е T (see Chap. 2). 
The denominator of the accuracy of the CLGSI and pg; = Es is the same; 


however, the numerator of the two indices accuracy is different. We would expect 
that \/ВсТсВс > v b'Pb, and then pyr, > Pyr. Similar results can be observed 


when we compared the maximized LPSI selection response and expected genetic 
gain per trait with the maximized CLGSI selection response and expected genetic 
gain per trait. 


5.2.2 Relationship Between the CLGSI and the LGSI 


As we have indicated, the CLGSI is mathematically equivalent to the М7; thus, it 
has similar statistical properties to those of ће LMSI some of which are described in 
this section. The rest can be seen in Chap. 4. Let Ос = Tc Yc, then matrix Qc can 
be written as 


(РГ) (С-г) 0 


= ; 5.36 
Qc = (рет) (C-D I ea) 
whence as w, —[0; --- ©], the two sub-vectors that conform vector Вс = Осас 
or fc = [ By Вс | can be written as 
В, = (P-r) (C-rD)w, (5.37) 


and 


Bo = 1 (Рг) С-г) м = ж – b. (5.38) 
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When Г is equal to the null matrix (no genomic information), Eq. (5.37) is equal 
to By = P^ 'Су = b and Rc = kyVb'Pb = Rr, which are the LPSI vector of 
coefficients and the selection response. 

By Eqs. (5.37) and (5.38), the maximized CLGSI selection response and the 
optimized CLGSI can be written as 


Re = мү СР - T) (€ - Dw + wr [1 - (P-D)'(C-r)w (5.39) 
and 


Ic = By + Boy = wy + By(y — Y), (5.40) 


respectively. 

Assume that when the number of markers and genotypes increases, matrix Г 
tends to matrix С and that, at the limit, Г = С; then, Eq. (5.39) can be written аз 
Кс = kV ww = Кс (except by Lc); in addition, В, = 0 and Вс = w, the weights of 
the LGSI, and, in this latter case, the CLGSI is equal to the LGSI, as we would 
expect. Thus, in the asymptotic context, the LGSI and the CLGSI are the same. 

An additional interesting result of the relationship between the CLGSI and the 
LGSI is as follows. The maximized correlation between H and [с (or CLGSI 
accuracy) can be written as 


= асос ; (5.41) 
^^ VatWcacy/pcTcBc 
However, when Г = С, Ҹес = г E ‚ В, = 0, Bo = w and 


Bo = [By Bo] =[0 w], whence ас\сВс = ас\сас = ВсТеВс = wTw, 
and Eq. (5.41) is equal to 1. That is, the maximum correlation between Н and [с 
in the asymptotic context is equal to the maximum correlation between H and the 
LGSI, and that value will be equal to 1. 

The asymptotic relationship between the CLGSI expected genetic gain per trait, 
Ec (Eq. 5.34), and ће LGSI expected genetic gain per trait, Ej, (Eq. 5.16), is as 


follows. When Г = C, Yc = р E and Вс = [0 ж], whence 
V 2r 
Eocp-tee-.. AW. os (5.42) 


vBcTcBc  vwTw 


This means that in the asymptotic context, the CLGSI expected genetic gain рег 
trait is twice the LGSI expected genetic gain per trait. Of course, 2 is only a 
proportionality constant; thus, in reality, Ec = Еу. 
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5.2.3 Statistical Properties of the CLGSI 


Assume that H and Гс have bivariate joint normal distribution; P, C, Г, and w are 
known, and Вс = Tc Wcac; then, the CLGSI properties are as follow: 


1. er = оне, 1.е., the variance of Ic (6; ) and the covariance between Н and [с (сн) 
are the same. 

2. The maximized correlation between H and Ic is ри = a, where ог. is the 
standard deviation of the variance of Ic (o) and og is the standard deviation of 


the variance of H(o7). 
3. The variance of the predicted error, Var(H — Ic) = (1 = Pi.) 62, is minimal. 


4. The total variance of H explained by Ic is о]. = Оше 62. 


Note that CLGSI properties 1 to 4 are the same as LMSI properties 1 to 4 and that 
both indices jointly incorporate phenotypic and marker information to predict the net 
genetic merit; however, the LMSI incorporates the marker information by the marker 
score values, whereas the CLGSI uses the GEBVs. 


5.2.4 Estimating the CLGSI Parameters 


Using the real maize (Zea mays) Fy population with 248 genotypes (each with two 
repetitions), 233 molecular markers and three traits—GY (ton ha^), EHT (cm), and 
PHT (cm)—described in Sect. 5.1.8 of this chapter, we estimated matrices P and 
C using Eqs. (2.22) to (2.24) described in Chap. 2 of this book. The estimated 


Д 0.45 1.33 2.33 m 0.07 0.61 1.06 
matrices were P = | 1.33 65.07 83.71 | and C = | 0.61 17.93 22.75 
2.33 83.71 165.99 1.06 22.75 44.53 


In a similar manner, we estimated matrix Г using Eqs. (5.21) to (5.23). The 

Е 0.07 0.65 1.05 n Д 

estimated matrix was Г = | 0.65 10.62 14.25 |. Note that matrices С апа Г 

1.05 14.25 26.37 

have similar values. This means that, in ће asymptotic context, we can assume that 
matrix Г tends to matrix С. 

To estimate the CLMSI and its associated parameters (selection response, 

expected genetic gain per trait, etc.), we need to estimate the vector of coefficients 

Вс = Tc Ч сас as Вс = To! сас, where Te = Е t and Vc = E А аге 


Г 
estimates of matrices Тс = Ё : 


C r А К 
| and Yc = Ё 3 respectively. The esti- 
mated CLGSI vector of coefficients Вс = T. сас is conformed by the vector of 
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phenotypic weights, В, = 


(P — rj C= Г)м, and by the vector of genomic 
weights, Вс = 1 - (Б – D (6€- P)|w. 


Let w = [5 —0.1 —0.1] be the vector of economic weights; then, according 


ES 


1 
to the estimated matrices P, C, and Г, В, = [0.08 —0.02 —0.01] and 
p. = [4.92 —0.08 —0.09], whence the estimated CLGSI in the training popu- 


lation can be written as 
Ic = Вуу + Bo¥. (5.43) 
Suppose a selection intensity of 10% (k, = 1.755); then, the estimated CLGSI 


selection response and expected genetic gain per trait were Rc = kıy p cTcBc = 
AR 


В Тевес . 


tively, whereas the estimated CLGSI accuracy was рц. = == = 0.814. 
On 


1.54 and Ё = = = [0.36 1.04 1.70 0.36 1.53 2.38] respec- 


The estimated LPSI selection response, expected genetic gain per trait, and 
accuracy were 0.601, [0.09 —0.81 —0.89], and 0.32 respectively; thus, the 
CLGSI was more efficient to predict the net genetic merit than the LPSI because 
the CLGSI accuracy and selection response were 0.814 and 1.54 respectively. 


5.2.5 LGSI and CLGSI Efficiency Vs LMSI, GW-LMSI and 
LPSI Efficiency 


In this subsection, we compare the accuracy, selection response, and efficiency of the 
LGSI and CLGSI with the LMSI, the GW-LMSI, and the LPSI using the simulated 
data for a maize (Zea mays) population described in Chap. 2, Sect. 2.8.1. 

Figure 5.3 presents the estimated accuracy values of the LMSI, the LGSI, the 
CLGSI, the LPSI, and the GW-LMSI for five simulated selection cycles. According to 
these results, for the first three selection cycles, the estimated accuracies of the indices, 
in decreasing order, were LMSI > LGSI > CLGSI > LPSI > GW-LMSI. That is, the 
highest estimated accuracy was obtained with the LMSI, whereas the lowest was 
obtained with the GW-LMSI. For the fourth and fifth selection cycles, the estimated 
accuracies, in decreasing order, were LMSI > LPSI > CLGSI > LGSI > GW-LMSI. 
This means that in all five selection cycles, ће LMSI had the highest accuracy and the 
GW-LMSI had the lowest accuracy, whereas the estimated LGSI accuracy was 
reduced to fourth place. Thus, the accuracy of the LGSI tended to decrease after the 
first three selection cycles whereas LPSI accuracy was a constant. 

To compare LGSI efficiency versus the efficiency of the other selection indices, 
we assumed that the interval between selection cycles in the LGSI is 1.5 years, 
whereas for CLGSI, LMSI, GW-LMSI, and LPSI, the interval was 4.0 years. 
Table 5.3 presents the estimated selection response of the LPSI, the LMSI, the 
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Fig. 5.3 Estimated accuracy values of the linear molecular selection index (МІ), the LGSI, the 
combined LGSI (CLGSD), the LPSI, and the genome-wide LMSI (GW-LMSJ) with the net genetic 
merit for four traits, 2500 markers, and 500 genotypes (each with four repetitions) in one environ- 
ment for five simulated selection cycles 


Table 5.3 Estimated selection response of the linear phenotypic selection index (LPSJ), the linear 
molecular selection index (LMSD, the genome-wide LMSI (GW-LMSI), the linear genomic 
selection index (LGSI), and the combined LGSI (CLGSD), not including (first part of the Table) 
and including (second part of the Table) the interval length between selection cycles, obtained using 
five simulated selection cycles 


Cycle LPSI LMSI GW-LMSI LGSI C-LGSI 
Estimated selection response not including the interval length 

1 17.84 19.60 16.24 14.36 18.24 

2 15.66 24.36 13.88 13.90 16.02 

3 14.44 14.70 12.13 13.59 14.61 

4 14.29 15.29 12.48 12.30 14.14 

5 13.86 15.15 11.49 11.38 13.51 
Average 15.22 17.82 13.24 13.11 15.30 


Estimated selection response including the interval length* 


1 4.46 4.90 4.06 9.58 4.56 
2 3.92 6.09 3.47 9.27 4.00 
3 3.61 3.68 3.03 9.06 3.65 
4 3.57 3.82 3.12 8.20 3.53 
5 3.47 3.79 2.87 7.59 3.38 
Average 3.80 4.46 3.31 8.74 3.83 


"The interval length for the LPSI, LMSI, GW-LMSI, and C-LGSI was 4 years, whereas the interval 


length for the LGSI was 1.5 years 
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Table 5.4 Estimated accuracy of ће LMSI, the LGSI, the CLGSI, ће LPSI, and the GW-LMSI; 
LMSI efficiency compared with LGSI, CLGSI, LPSI, and GW-LMSI efficiencies, expressed in 
percentages, for five simulated selection cycles 


Estimated accuracy LMSI efficiency compared with 
GW- GW- 
Cycle LMSI |LGSI |CLGSI |LPSI |LMSI LGSI |CLGSI |LPSI |LMSI 


GW-LMSI, the LGSI, and the CLGSI, including and not including the interval 
between selection cycles (first and second parts of Table 5.3 respectively), obtained 
using five simulated selection cycles. According to the first part of Table 5.3, the 
average estimated selection responses, in decreasing order, of the LMSI, CLGSI, 
LPSI, GW-LMSI, and LGSI for the five simulated selection cycles were 17.82, 
15.30, 15.22, 13.24, and 13.11 respectively, when the length of the interval between 
selection was not included. If the length of the interval between selection cycles is 
included when comparing the selection response of the indices in terms of time, the 
estimated selection response of LMSI, CLGSI, LPSI, GW-LMSI must be divided by 
4 in each selection cycle, and the estimated LGSI selection response should be 
divided by 1.5. Thus, according to the second part of Table 5.3, if we include the 
length of the interval between selection cycles, the average estimated selection 
responses, in decreasing order, of LGSI, LMSI, CLGSI, LPSI, and GW-LMSI for 
the five simulated selection cycles were 8.74, 4.46, 3.83, 3.80, and 3.31. This means 
that in terms of time, the efficiency of the LGSI was higher than the efficiency of the 
other four selection indices. 

Table 5.4 presents the estimated accuracy of the LMSI, LGSI, CLGSI, LPSI, and 
the GW-LMSI. In addition, Table 5.4 presents the efficiency when predicting the net 
genetic merit of the LMSI with respect to the LGSI, CLGSI, LPSI, and GW-LMSI as 
percentages, for five simulated selection cycles. Note that in this case, LMSI 
efficiency was higher than the efficiency of the other four selection indices, because 
the LMSI had the highest correlation with the net genetic merit. 
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Chapter 6 ® 
Constrained Linear Genomic Selection ciim 
Indices 


Abstract The constrained linear genomic selection indices are null restricted and 
predetermined proportional gain linear genomic selection indices (RLGSI and 
PPG-LGSI respectively), which are a linear combination of genomic estimated 
breeding values (GEBVs) to predict the net genetic merit. They are the results of a 
direct application of the restricted and the predetermined proportional gain linear 
phenotypic selection index theory to the genomic selection context. The RLGSI can 
be extended to a combined RLGSI (CRLGSJ) and the PPG-LGSI can be extended to 
a combined PPG-LGSI (CPPG-LGSJD); the latter indices use phenotypic and GEBV 
information jointly in the prediction of net genetic merit. The main difference 
between the RLGSI and PPG-LGSI with respect to the CRLGSI and the CPPG- 
LGSI is that although the RLGSI and PPG-LGSI are useful in a testing population 
where there is only marker information, the CRLGSI and CPPG-LGSI can be used 
only in training populations when there are joint phenotypic and marker information. 
The RLGSI and CRLGSI allow restrictions equal to zero to be imposed on the 
expected genetic advance of some traits, whereas the PPG-LGSI and CPPG-LGSI 
allow predetermined proportional restriction values to be imposed on the expected 
trait genetic gains to make some traits change their mean values based on a 
predetermined level. We describe the foregoing four indices and we validated their 
theoretical results using real and simulated data. 


6.1 The Restricted Linear Genomic Selection Index 


Let Н = w'g be the net genetic merit and /с = В’у the linear genomic selection index 
(LGSI, see Chap. 5 for details), where g, y, w, and В are vectors t x 1 (t= number of 
traits) of breeding values, genomic breeding values, economic weights, and LGSI 
coefficients respectively. It can be shown that Соу(Гс, g) = ГВ is the covariance 
between g and /с = fly, and that Var(y) = Г is the genomic covariance matrix of 
size t x t (see Chap. 5 for details). The objective of the restricted linear genomic 
selection index (RLGSI) is to improve only (t — г) of t (r < t) traits (leaving r of them 
fixed) in a testing population using only genomic estimated breeding values 


© The Author(s) 2018 121 
J. J. Céron-Rojas, J. Crossa, Linear Selection Indices in Modern Plant Breeding, 
https://doi.org/10.1007/978-3-319-91223-3 6 


122 6 Constrained Linear Genomic Selection Indices 


(GEBVs). The RLGSI minimizes the mean squared difference between /с and H, 
E[(H — 151, with respect to В under the restriction Cov(Ia, U'g) = UT = 0, where 
U' is a matrix (t — 1) x t of 1s and Os, in a similar manner to the restricted linear 
phenotypic selection index (RLPST) described in Chap. 3 in the phenotypic selection 
context. 


6.1.1 The Maximized RLGSI Parameters 


Let Var(IG) = В'ГВ be the variance of Ig = f) y, w'Cw the variance of Н = w'g, and 
Соу(16, H) = w'Tf the covariance between H = w'g and Ic = Ву. The mean 
squared difference between Н and Ig can be written as E[(H — Io], which should be 
minimized under the restriction ОТВ = 0 assuming that Г, C, U', and w are known, 
i.e., it is necessary to minimize the function 


fr(B, Vv) = w Cw + ВТВ — 2 ГВ + 2v UTR (6.1) 
with respect to vectors В and У’ = [vı vz ··· v,— i], where v is a vector of Lagrange 
multipliers. In matrix notation, the derivative results of Eq. (6.1) are 

-1 
p | Fr TU Tw 
Ё “Jur of (of oe) 


Following the procedure described in Chap. 3 (Eqs. 3.2 to 3.5), it can be shown 
that the RLGSI vector of coefficients that minimizes E[(H — Tey | under the 
restriction ОТВ = 0 is 


Bro = Kew, (6.3) 


where Кс = [L — Qc], Ос = U(UTU) ОТ, w is a vector of economic weights, 
and I, is an identity matrix t x t. When no restrictions are imposed on any of the 
traits, U' is a null matrix and Beg = w, the optimized LGSI vector of coefficients (see 
Chap. 5 for details). 

By Eq. (6.3), the RLGSI, and the maximized RLGSI selection response and 
expected genetic gain per trait can be written as 


Inc = Brot (6.4) 


kr | 
Reo =F ProT Bro (6.5) 


and 


6.1 The Restricted Linear Genomic Selection Index 123 


к Tre 


Ерс = ——Ó————, 
1с v ProT Bro 


respectively, where kz is the standardized selection differential (or selection inten- 
sity) associated with the RLGSI, and Lg is the interval between selection cycles or 
the time required to complete a selection cycle using the RLGSI. Equations (6.4) to 
(6.6) depend only on GEBV information; thus, they are useful in testing populations. 


(6.6) 


6.1.2 Statistical Properties of RLGSI 


Assuming that Н = w’g and Irc = Вьсу have bivariate joint normal distribution, 
Bro = Kew, and Г, C, and w are known, it can be shown that ће RLGSI has the 
following properties: 


1. Matrices Kg and Ос are idempotent (Кс = Ki and Qg = OO) and orthogonal 
(КсОс = ОсКс = 0), that is, they are projectors. Matrix Ос projects vector 
В = w into a space generated by the columns of matrix U'T due to the restriction 
UTS = 0 used when fz(f. v) (Eq. 6.1) is minimized with respect to vectors В and 
v, Whereas matrix Kg projects w into a space perpendicular to that generated by 
the U'T matrix columns. 

2. Because of the restriction ОТВ = 0, matrix Kg projects vector w into a space 
smaller than the original space of w. The space reduction into which matrix Kg 
projects w is equal to the number of zeros that appears in Eq. (6.6). 

3. Vector Вес = Kow minimizes the mean square error under the restriction 
ОТВ = 0. 

4. The variance of Ir = Bro (от = pac Pro) is equal to the covariance between 
[вс = Brot and H = w'g (OHI gg => WTB ag). 

5. The maximized correlation between H and /кс is equal to рн = Il where 


Ing = \/ Bac Вес and on = vw Cw are the standard deviations of Ire = Beg 
and H — w'g respectively. 

6. The variance of the predicted error, Var(H — IgG) = (1 — Pius) бү, is minimal. 
Note that Var(H — Igc) = BE. + а. — 26ніһс, and when Brg = Kew, 
6j. = ош, Whence Var(H — IgG) = On — 6j. = (1 — Pirs) Or is minimal. 


The statistical RLGSI properties are equal to the statistical RLPSI properties. 
Thus the RLGSI is an application of the RLPSI to the genomic selection context. 


6.1.3 Numerical Examples 


To estimate the parameters associated with the RLGSI, we use the real data set 
described in Chap. 5, Sect. 5.1.8, where we found that, in the testing population, the 
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" 0.21 2.95 5.00 
estimate of matrix Г маѕ Г = | 2.95 42.41 71.11 |.We use this matrix and the 
5.00 71.11 121.53 
GEBVs associated with the traits grain yield (GY, ton ha !), ear height (EHT, cm), 
and plant height (PHT, cm) to illustrate the RLGSI theoretical results. 

Suppose that on the RLGSI expected genetic gain per trait we impose one 
and two null restrictions using matrices 0, = [1 0 0] and U} = o : 4 
(see Chap. 3, Sect. 3.1.3, for details about matrix U’). We need to estimate the 
RLGSI vector of coefficients (Brg = Kow) as Brg = Kew, where Kg = [L — Qc] 

25, p EE MS 
and Ос = U(U'TU) UT are estimates of matrices Kg = [I5 — Qc] and Ос = U 
(UTU) 'UT respectively, and I; is an identity matrix 3 x 3. The estimated Ос 


matrices for restrictions О —[1 00] and U5— | | | were Ос, =U; (020) ` 
. [10 14.05 23.81] 20. [1001.8 
Ur=|0 о 0 |andQg,-U;(UjTU;) ОГ | 0 1.0 0.90 | respec- 
0 0 0 00 0 


tively, whereas the estimated Kc matrices for both restrictions were Ko, = [L -Q«] 


0 —14.05 —23.81 А n 00 —11.18 
=|0 1.0 0 | and Kg, = [в-Ос,|=|00 —0.90 
0 0 1.0 00 1.0 


Let w' = [5 —0.1 —0.1] be the vector of economic weights; then the estimated 
RLGSI vector of coefficients for one and two null restrictions were [y RG, © № ‘KG, = 
[3.78 —0.1 —0.1] and Вис, = = wK, = = [1.12 0.09 —0.1] respectively, and 
the estimated RLGSI for both restrictions can be written as [m = 3.78GEBV, — 0.1 
GEBV, — 0.1GEBV3 and Tos = ].I2GEBV | + 0.09GEBV, — 0.1GEBV3, where 
GEBV ,, СЕВУ», and GEBV; are the genomic estimated breeding values associated with 
traits GY, EHT, and PHT respectively in the testing population. 

Table 6.1 presents 20 genotypes selected from a population of 380 genotypes and 
the GEBVs in the testing population ranked according to the estimated RLGSI 
values for one restriction, where 0 —[1 0 0]. The estimated RLGSI 
values for genotypes 5 and 306 can be obtained as follows: Zac, — 3.78(—0.6) — 0.1 
(—8.67) — 0.1(15.97) — 0.196 and Тс, — 3.78(0.13) — 0.1(1.31) — 0.1(1.66) — 
0.194 respectively. This procedure is valid for any number of genotypes and GEBVs 
in the testing population. 

Assume a selection intensity of 10% (kj, = 1.755); then the estimated 
RLGSI selection response and expected genetic gain per trait not including the 

Brot 


interval length were Rro, = ka Вс T Bre, = 0.40 and Big. = 
\/ Pc, UP ac, 
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Table 6.1 Number of genotypes selected from 380 genotypes of a real testing population; genomic 
estimated breeding values (GEBVs) associated with three traits: grain yield (GY, ton һаг !), еаг 
height (ЕНТ, cm), and plant height (PHT, cm) in the testing population, and estimated апа ranked 
restricted linear genomic selection index (RLGSI) values obtained in the testing population for one 
null restriction 


Estimated GEBVs in the testing population 

Number of genotypes GEBV-GY GEBV-EHT GEBV-PHT Estimated RLGSI 
5 —0.6 —8.67 —15.97 0.196 
306 0.13 1.31 1.66 0.194 
6 0.06 1.83 —1.13 0.157 
349 0.37 4.34 8.12 0.153 
142 —0.26 —5.47 —5.85 0.149 
69 —0.11 —3.43 —2.16 0.143 
24 0.03 —0.43 0.19 0.137 
192 —0.8 —13.91 к=] 71 0.137 
33 —0.18 —1.44 —6.71 0.135 
18 —0.43 —5.48 —12.08 0.131 
21 —1.00 —16.11 —22.96 0.127 
41 0.17 1.09 4.08 0.126 
351 0.16 2.64 2.15 0.126 
323 0.04 —0.79 1.04 0.126 
158 —0.49 —8.95 — 10.83 0.126 
25 —0.24 —3.46 —6.86 0.125 
338 0.37 3.88 8.89 0.122 
316 —0.01 —0.51 —1.09 0.122 
32 —0.19 —3.97 —4.43 0.122 
204 —0.46 —7.41 —11.19 0.121 


0 1 0 
the estimated RLGSI selection response and expected genetic gains not including 


the interval length were Rro =ki,\/ Pc, ГВ кс, = 0.23 апа 
Brot 


V Bra, Bre, 


restrictions increases, the estimated RLGSI selection response value decreases, 
whereas the number of zeros increases in the estimated RLGSI expected genetic 
gain per trait. The number of zeros in the estimated RLGSI expected genetic gain 
per trait is equal to the number of restrictions imposed on RLGSI by matrix U', 
where each restriction appears as 1. 

Figure 6.1 presents the frequency distribution of the estimated RLGSI values for 
one (Fig. 6.1a) and two null restrictions (Fig. 6.1b). For both restrictions the 
frequency distribution of the estimated RLGSI values approaches the normal 
distribution. 


—[0 –1.42 —2.58] respectively. For two restrictions, with U, = f : | , 


a, 
Ерс, = ki 


—[0 0 —2.29] respectively. When the number of 
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54 Ё а 
> " RLGSI values for one 
3 PE null restriction 


b 
RLGSI values for two 
null restrictions 


Fig. 6.1 Distribution of 380 estimated restricted linear genomic selection index (RLGSI) values 
with one (a) and two (b) null restrictions respectively obtained in a real testing population for one 
selection cycle in one environment 


Now we use the simulated data set described in Chap. 2, Sect. 2.8.1, to compare 
RLPSI (restricted linear phenotypic selection index, Chap. 3 for details) efficiency 
versus RLGSI efficiency. Table 6.2 presents the estimated RLPSI and RLGSI 
selection response for one, two, and three null restrictions imposed by matrices 
1 0 0 / E m. А 
0 1 d ‚ апа U; = |O 1 0 0| for five simulated 

00 1 0 
selection cycles including and not including the interval between selection cycles. In 
each selection cycle, the sample size was equal to 500 genotypes, each with four 
repetitions and four traits, whereas the selection intensity was 10% (k; = 1.755); the 
interval lengths for the RLPSI and RLGSI were 4 and 1.5 years (Beyene et al. 2015) 
respectively. 

Table 6.2 was divided in two parts. The first part presents the estimated RLPSI 
whereas the second part presents the estimated RLGSI selection responses. Columns 
2, 3, and 4 in Table 6.2 present the estimated RLPSI and RLGSI selection responses 
not including the interval length, whereas columns 5, 6, and 7 present the estimated 
RLPSI and RLGSI selection response, including the interval length. The averages of 
the estimated RLPSI selection response not including the interval length for one, 
two, and three restrictions were 7.04, 5.50, and 3.90, whereas when the interval 
length was included, the averages were 1.76, 1.38, and 0.98 respectively. The 
averages of the estimated RLGSI selection response not including the interval length 


о-о ou = | 
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Table 6.2 Estimated restricted linear phenotypic selection index (RLPSI) and RLGSI selection 
responses for 1, 2, and 3 null restrictions for 5 simulated selection cycles including and not 
including the interval between selection cycles. The interval lengths for the RLPSI and the 
RLGSI were 4 and 1.5 years respectively 


Estimated RLPSI selection response 


Not including interval length Including interval length* 
Cycle 1 2 3 1 2 3 
1 6.87 5.54 4.13 1.72 1.39 1.03 
2 8.45 5.94 4.27 2.11 1.49 1.07 
3 7.17 5.79 4.16 1.79 1.45 1.04 
4 6.68 5.06 3:72, 1.67 1.27 0.93 
6.02 5.16 3.24 1.51 1.29 0.81 
Average 7.04 5.50 3.90 1.76 1.38 0.98 
Estimated RLGSI selection response 
Not including interval length Including interval length? 
Cycle 1 2 3 1 3 
1 6.41 5.58 4.71 4.28 3.14 
2 5.04 3.47 2.47 3.36 1.65 
3 4.76 3.36 2.22 3.17 1.48 
4 4.51 3.07 2.28 3.01 1.52 
4.46 3.10 2.26 2.97 1.51 
Average 5.04 3.72 2.79 3.36 1.86 


"The estimated RLPSI selection response was divided by 4 
>The estimated RLGSI selection response was divided by 1.5 


for one, two, and three restrictions were 5.04, 3.72, and 2.79, whereas when the 
interval length was included the averages were 3.36, 2.48, and 1.86 respectively. 
These results indicated that when the interval length was included in the estimation 
of the RLPSI and RLGSI selection response, RLGSI efficiency was greater than 
RLPSI efficiency, and vice versa, when the interval length was not included the 
RLPSI efficiency was greater than RLGSI efficiency. 

Table 6.3 presents the estimated RLPSI (first part) and RLGSI (second part) 
expected genetic gain per trait not including the interval between selection cycles for 
one, two, and three null restrictions in five simulated selection cycles. In this case, 
RLPSI efficiency is greater than RLGSI efficiency because the averages of the 
estimated RLPSI expected genetic gain per trait were —2.52, 2.26, and 2.26 for 
one null restriction; 2.84 and 2.65 for two null restrictions; and 3.90 for three null 
restrictions. For the same set of restrictions, the averages of the estimated RLGSI 
expected genetic gain per trait were: —1.85, 1.13, and 2.06 for one null restriction; 
1.52 and 2.19 for two null restrictions, and 2.79 for three null restrictions. However, 
divided by the interval length (4 years in the RLPSI), the averages of the estimated 
RLPSI expected genetic gain per trait were —0.63, 0.57, and 0.57 for one null 
restriction; 0.71 and 0.66 for two null restrictions, and 0.98 for three null restrictions. 
In a similar manner, dividing by the interval length (1.5 years in this case), the 
averages of the estimated RLGSI expected genetic gain per trait were —1.23, 0.75, 
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Table 6.3 Estimated RLPSI and RLGSI expected genetic gain per trait for 1, 2, and 3 null 
restrictions for 5 simulated selection cycles (each with 4 traits) not including the interval length 
between selection cycles 


Estimated RLPSI expected genetic gain for one, two, and three null restrictions 
1 2 3 
Cycle Tl |T2 T3 T4 Tl |T2 |T3 T4 Tl |T2 |T3 |T4 
1 0 —2.48 2.03 |2.66 |0 0 2.77 |2.7] |0 0 0 4.13 
2 0 —3.41 |2.33 271 |0 0 2.87 |3.07 |0 0 0 4.27 
3 0 —2.30 |3.12 |1.74 |0 0 3.11 |2.68 |0 0 0 4.16 
4 0 —2.88 1.42 |2.38 |0 0 2.35 |2.70 |0 0 0 3.72 
5 0 —1.83 |2.38 |1.81 |0 0 3.12 |2.04 |0 0 0 3.24 
Average |0 —2.52 |2.26 |2.26 |0 0 2.84 12.65 |0 0 0 3.90 
Estimated RLGSI expected genetic gain for 1, 2, and 3 null restrictions 
1 2 3 
T4 
4.71 
2.47 
2.22 
2.28 
2.26 
2.79 


and 1.37 for one restriction; 1.01 and 1.46 for two restrictions; and 1.86 for three 
restrictions. 

Table 6.4 presents the estimated RLPSI heritability (hz) values, the estimated 
restricted linear genomic selection index (RLGSI) accuracy (pj;,,) values, the values 


of W = PHlse Lpp (Lap = 4), and the values of P = 100 (Ar — 1), where år = ы, / 


Tr 
f yj, and Py), is the estimated RLPSI accuracy, for one, two, and three restrictions 
for five simulated selection cycles. The RLGSI interval length was Lrg = 1.5 


whereas the averages of the values of W = “ae Lap for each restriction were 
Tr 
1.22, 0.85, and 0.60; this means that the estimated Technow inequality (Technow 


et al. 2013), Lrg < P Hinc Lgp (Chap. 5, Eq. 5.18), was not true. Thus, according to 


In 
the Technow inequality results, for this data set, RLGSI efficiency in terms of time 


was not greater than RLPSI efficiency. The inequality Lag < P His 
h 


Tr 

because the estimated RLGSI accuracy was very low, whereas RLPSI heritability was 
high. Thus, note that the averages of the estimated RLGSI accuracy for one, two, and 
three null restrictions were 0.25, 0.19, and 0.14 respectively, and the averages of the 
estimated RLPSI heritability values were 0.70, 0.78 and 0.88, respectively. Thus, 
according to these results, because the estimated RLGSI accuracy is very low and 


Lj, was not true 
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RLPSI heritability is high, RLGSI efficiency was lower than RLPSI efficiency in 
terms of time. 

The last three columns of Table 6.4, from left to right, present the estimated 
p values, p = 100 (ar — 1), for one, two, and three null restrictions in five simulated 
selection cycles. The average of the p values indicates that for each of the three 
restrictions the RLPSI efficiency was 65.05%, 78.73%, and 74.09%, greater than 
RLGSI efficiency at predicting the net genetic merit. Thus, for this data set, the 
RLPSI was a better predictor of the net genetic merit than the RLGSI in each cycle. 


6.2 The Predetermined Proportional Gain Linear Genomic 
Selection Index 


6.2.1 Objective of the PPG-LGSI 


Let а = [4 4 ... d,] bea vector 1 x г (ris the number of predetermined 
proportional gains) of the predetermined proportional gains imposed by the breeder, 
and assume that и. is the population mean of ће qth trait before selection. The 
objective of the predetermined proportional gain linear genomic selection index 
(PPG-LGSI) is to change yz to р, + d, in the testing population, where d, is a 
predetermined change in ро. It is possible to solve this problem minimizing the mean 
squared difference between Iç = Ё'ү and H = w'g, ЕСН — 1)2], under the 
restriction ОТВ = 064, where Өс is a proportionality constant, or under the 


d. 0 ... 0 — 
restriction ОТВ = 0, where О’ = 9 E: "s : EZ is a matrix 
0 0 ... d, ЦА 
(т — 1) x r (see Chap. 3 for details), апаа, (4 = 1, 2. . ., r) is the q" element of vector 
d’=[d, 4 ... d,];U'isamatrix (7 — 1) x tof 1s and Os, andr = {о} (4, 
qd = 1, 2, ..., t, t = number of traits) is a covariance matrix of additive genomic 


breeding values, у’ = [yi yo. . .y;]. 


6.2.2 The Maximized PPG-LGSI Parameters 


In this subsection, we minimize E[(H — i under the restriction D/U/TB = 0 and 
later under the restriction U'Tb = Oad. Under the restriction D/UTB = 0, it is 
necessary to minimize the function 


fp (f. у) = ВТВ + w Cw — 2wT + 2v'D'UTB (6.7) 
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with respect to В and у = [vi v2 ... у |, where у is a vector of Lagrange 
multipliers. From a mathematical point of view, Eq. (6.7) is equal to Eq. (6.1); thus, 
the vector of coefficients В of the PPG-LGSI should be similar to the vector of 
coefficients of the RLGSI (Eq. 6.3), i.e., the PPG-LGSI vector of coefficients is equal 
to 


Вьс = Kpw, (6.8) 


where now Kp = [L — Ор, Qp = UD(D'UTUD) 'D'UT, w is a vector of 
economic weights, and I, is an identity matrix t x t. When D' = U', Вьс = Вес 
(the RLGSI vector of coefficients), and when U' is a null matrix, Bpg = w (the LGSI 
vector of coefficients). This means that the PPG-LGSI includes the RLGSI and the 
LGSI as particular cases. 

Under the restriction ОТВ = Ocd (see Chap. 3 for details) the vector of coeffi- 
cients of the PPG-LGSI can be written as 


Bog = Ba + 6;U(UTU) а, (6.9) 


where Beg = Ком (Eq. 6.3), Kg = [I — Qc]. Ос = U(UTU) "ОТ, and а = 
[di d» ... d,]isthe vector of the predetermined proportional gains imposed by 
the breeder. It can be shown that Oc, the proportionality constant, can be written as 


| d(UTU) 'UTw 


ө 
©  d(UTU) d 


(6.10) 


When 0с = 0, Вьс = Вес, and when U' is a null matrix, fpc = w. Equations (6.8) 
and (6.9) give the same results, that is, both equations express the same result in a 
different mathematical way. 

The maximized selection response and expected genetic gain per trait of the 
PPG-LGSI can be written as 


k 
Rpg — = Bool Brg (6.11) 
G 
and 
k Г 
Ерс = а= (6.12) 
суВьсГВьс 


respectively, where Lg is the time required to complete a selection cycle using the 
PPG-LGSI. Equations (6.11) and (6.12) depend only on GEBV information. 
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6.2.3 Statistical Properties of the PPG-LGSI 


Assuming that Н = w'g and the PPG-LGSI (Ip = Вьсү) have bivariate joint normal 
distribution, Bpg = Kpw; Г, C, and w are known, it can be shown that PPG-LGSI 
has the following statistical properties: 


1. The vector Вс = Крм minimizes the mean square error under the restriction 
D'UTf = 0. 

2. The variance of Ipg = f» Y (бт = p; Vf po) is equal to the covariance between 
Ipg = Bpo Y and Н = w'g (он, = WI Bp). 

3. The maximized correlation between H and Ipg (also called PPG-LGSI accuracy) 


is equal to рн, = e, where от, = 4/ ВьсГВьс and он = v w'Cw are the 


standard deviations of [pg = P'pg Y and Н = w'g respectively. 


4. The variance of the predicted error, Var(H — Ipg) = (1 = ош) б, is minimal. 


The statistical PPG-LGSI properties are equal to the statistical PPG-LPSI prop- 
erties, then, the PPG-LGSI is an application of the PPG-LPSI to the genomic 
selection context. 


6.2.4 Numerical Example 


To illustrate the PPG-LGSI theory, we use the estimated matrix 

pi 0.21 2.95 5.00 

Г = |2.95 42.41 71.11 and the GEB Vs associated with the traits GY (ton 
5.00 71.11 121.53 

ha^ )), EHT (cm), and PHT (cm), described in Sect. 6.1.3. 

It is necessary to estimate the PPG-LGSI vector of coefficients Bpg = Brg + 9.9 
(UTU) а (Eqs. 6.9 and 6.10). In Sect. 6.1.3, we showed that the estimated 
vectors of coefficients of Beg = Kew for the null restrictions U; = [1 0 0] 
and U, = lo : o were Bac, = УК’ = [3.78 —0.1 —0.1 Jand Brg, =w 
к. = [1.12 0.09 —0.1] respectively, where м = [5 -—0.1 —0.1]. This 
means that to estimate Вс = fac + 05U(UTU) !d, we need only to estimate 
05U(U' TU) ‘d for both sets of restrictions. 

Consider matrix О = [1 0 0] and let 4, = 7.0 be the predetermined 
proportional gain restriction for trait 1. We can estimate 0с and U(UTU) 'd as 


~ т0(0 Ро) ОГ» M" 33.333 
дс = — —,— = 0.036 and  U((UTU) 70=| 0 |, 
7.0(UjDU;) 7.0 0 
whence the PPG-LGSI vector of coefficients Was 
5.0 
Bro, = Bro, + дс. (ОРО) 7.0 = | —0.1 |, and the estimated PPG-LGSI 


—0.1 
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Was Tre, = 5.0GEBV, — 0.1GEB V2 — 0.1GEB V3. In a similar manner, we can 
estimate the PPG-LGSI vector of coefficients under restrictions 


U, = lo | d and 5=[7 -3]. Ш this case, 

x м n ЕЕ 4.97 

Bec, = Bro, + 0с.0(0Г0) d; = | —0.18 | and the estimated PPG-LGSI 
—0.10 


was Tre, = 4.97GEBV, — 0.18GEBV> — 0.1GEBV3. 

Figure 6.2 presents the frequency distribution of the estimated PPG-LGSI 
values for one (Fig. 6.2a) and two (Fig. 6.2b) predetermined restrictions, d = 7 
and а = [7 —3] respectively, obtained in a real testing population for one 
selection cycle in one environment. For both restrictions, the frequency distribution 
of the estimated PPG-LGSI values approaches the normal distribution. 

Assume a selection intensity of 10% (kr, = 1.755); then, for one predetermined 
restriction, where 0, = [1 0 0] and d; = 7.0, the estimated PPG-LGSI selection 
response and expected genetic gain per trait, not including the interval length, were 


А Е = E Bho F 
Rro, =kig\/ Bog ГВьс, =1.05 and Е, =k = [0.74 9.92 16.54] 


\/ Bog TB pa, 


a 
PPG-LGSI values for one 
restriction 


b 
PPG-LGSI values for two 
restrictions 


Fig. 6.2 Distribution of 380 estimated predetermined proportional gain linear genomic selection 
index (PPG-LGSI) values with one (a) and two (b) predetermined restrictions, d = 7 and d' = 
[7 —3] respectively, obtained in a real testing population for one selection cycle in one 
environment 
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: in ; ‚ 100 
respectively. For two restrictions, with U,= 010 
estimated RLGSI selection response and expected genetic gains, not including the 
. 2х yov ay - Bog E 
interval length, were Rpg, = kj; \/Врс,ГВс, =0.52 and Epo, =k; == 


V Pc, UP pc, 
[0.11 —0.05 0.14] respectively. 


Now, we use the simulated data set described in Chap. 2, Sect. 2.8.1 to compare 
PPG-LGSI efficiency versus predetermined proportional gain linear phenotypic 


| and 4'= [7 —3] the 


selection index (PPG-LPSI) efficiency. Let 0, =[1 0 0], 0 = E : ol ; 
1 ооо 

and Uj— |0 1 0 0 | be the matrices and = 7, dj = [7 —3], and d; = 
00 1 0 


[7 —3 5] the vectors for one, two, and three predetermined restrictions respec- 
tively. Table 6.5 presents the estimated PPG-LPSI and PPG-LGSI selection response 
for each predetermined restriction in five simulated selection cycles including 
and not including the interval between selection cycles (4 years for the PPG-LPSI 
and 1.5 years for the PPG-LGSD); estimated PPG-LPSI and PPG-LGSI accuracy; 
and estimated variance of the predicted error (VPE). In each selection cycle, the 
sample size was equal to 500 genotypes, each with four repetitions and four 
traits. The selection intensity was 10% (k; = 1.755). 

The averages of the estimated PPG-LPSI selection response not including the 
interval length were 15.14, 14.87, and 13.30, whereas when the interval length was 
included, the average selection responses were 3.79, 3.72, and 3.33, for one, two, 
and three predetermined restrictions respectively (Table 6.5). The averages of the 
estimated PPG-LGSI selection responses not including the interval length for one, 
two, and three predetermined restrictions were 14.48, 13.47, and 11.26 respectively, 
and when the interval length was included, the selection responses were 9.65, 8.98, 
and 7.51 respectively (Table 6.5). These results indicate that when the interval length 
was included in the estimation of the PPG-LPSI and PPG-LGSI selection responses, 
PPG-LGSI efficiency was greater than PPG-LPSI efficiency, and vice versa, when 
the interval length was not included in the PPG-LPSI and PPG-LGSI selection 
responses, PPG-LPSI efficiency was higher than PPG-LGSI efficiency. 

The averages of the estimated VPE values of the PPG-LPSI for one, two, and 
three predetermined restrictions were 22.42, 30.56, and 41.17 respectively, whereas 
the estimated VPE values of the PPG-LGSI (see Sect. 6.2.3 for details) were 59.80, 
66.95, and 83.98, respectively, that is, in all selection cycles, the VPE of the 
PPG-LPSI was lower than that of the PPG-LGSI. This means that for this data set, 
the PPG-LPSI was a better predictor of the net genetic merit than the PPG-LGSI. 
These results can be explained by observing that the averages of the estimated 
PPG-LPSI accuracies were 0.88, 0.86, and 0.77, whereas the estimated PPG-LGSI 
accuracies were 0.65, 0.68, and 0.57 for each predetermined restriction, that is, the 
estimated PPG-LGSI accuracies were lower than the estimated PPG-LPSI accuracies 
for this data set. 
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Table 6.6 Estimated PPG-LPSI heritability (h2), values of Wp = А = Lp (Lp = 4), and the ratio of 


Р 
the estimated PPG-LPSI accuracy (2729) to the estimated PPG-LGSI accuracy (р, Hip D 
dp = Pur, / Dur," and values of p = 100(4, — 1) for 1, 2 and 3 predetermined restrictions for 
five simulated selection cycles 


PPG-LPSI heritability Values of Wp Estimated ratio values (p) 
Cycle 1 2 3 1 2 3 1 2 3 
1 0.84 0.77 0.83 471 |4.13 [3.72 |—18.662 ,—6.71 | –10.20 


0.80 0.78 0.83 322 |3.17 |2.42 18.30 20.54 32.04 
0.77 0.76 0.8 3.18 |3.00 | 2.45 19.89 21.59 31.42 
0.76 0.75 0.78 2.80 |2.71 | 2.10 29.16 31.84 33.75 
0.75 0.75 0.79 2.57 |2.49 | 1.97 35.26 36.55 42.35 
Average 0.72 0.71 0.76 3.29 |3.12 |2.53 16.80 20.76 25.87 


ды 


Table 6.6 presents the estimated predetermined PPG-LPSI heritability (h2) values, 


Wp= РН Ie Lp (Lp = 4) values, and ratio of the estimated PPG-LPSI accuracy (р шь) 


Р 
to the estimated PPG-LGSI accuracy (Pyr); 1е., Ap = Pur, /P Hipo» and, finally, 


values of р = 100(Ap — 1) for one, two, and three null restrictions for five 
simulated selection cycles. 

The averages of the Wp values for one, two, and three null restrictions were 3.29, 
3.12, and 2.53, respectively, whereas the PPG-LGSI interval length was 1.5 


(Lg = 1.5). This means that the estimated Technow inequality, Lg < Pa Lp (see 
h 


Р 
Chap. 5, Eq. 5.18) was true. Thus, PPG-LGSI efficiency in terms of time was greater 


than PPG-LPSI efficiency for this data set. These results coincide with those 
obtained earlier in this chapter, when we compared PPG-LGSI efficiency versus 
PPG-LPSI efficiency in terms of interval length. However, the average values of 
p= 100(Ap — 1) (see Chap. 5, Eq. 5.15) were, in percentage terms, 16.80%, 
20.76%, and 25.85% for each restriction. These latter results indicate that for this 
data set, the PPG-LPSI was a better predictor of the net genetic merit than the 
PPG-LGSI. This is because the estimated PPG-LPSI accuracies were higher than the 
estimated PPG-LPSI accuracies for this data set. We found similar results when we 
compared the PPG-LPSI VPE versus PPG-LGSI VPE (Table 6.5). 


6.3 The Combined Restricted Linear Genomic Selection 
Index 


The combined restricted linear genomic selection index (CRLGSI) is based on the 
RLPSI (Chap. 3) and combined linear genomic selection index (CLGSI, Chap. 5) 
theory. In the RLPSI, the breeder's objective is to improve only (t — r) of t (r < t) 
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traits, leaving r of them fixed; the same is true for the CRLGSI, but in the latter case, 
it is necessary to impose 2r restrictions, 1.е., we need to fix r traits and their 
associated г GEBVs to obtain results similar to those obtained with the RLPSI. 
This is the main difference between the CRLGSI and the RLPSI. 

It can be shown that Соус, ac) = Вс is the covariance between the breeding 
value vector (ас = |g y |) and the CLGSI, Ic = Вс (see Chap. 5 for details), 
where t- = [у ү]. ш the CRLGSI, we want some covariances between the 
linear combinations of ac (Осас) and CLGSI to be zero, ie., 
Cov(Ic, Ucac) = UtWcpc = 0, where Uc is a matrix 2(7 — 1) x 2t of 1s and 05 
(1 indicates that the trait and its associated GEBV are restricted, and 0 that the trait 
and its GEBV have no restrictions) and Ус = : : is a block covariance matrix 
of ac = |g! ү] where C and Г are the covariance matrices of breeding (g) and 
genomic (у) values respectively. This problem can be solved by minimizing the 
mean squared difference between the CLGSI and Н (E[(H — Icy) under the 
restriction Ос сВс = 0 similar to ће RLGSI in Sect. 6.1. 


6.3.1 The Maximized CRLGSI Parameters 


Let Tc = B 4 


are the covariance matrices of phenotypic (y) and genomic (y) values respectively. 
Based on the Eq. (6.1) result, it can be shown that the CRLGSI vector of coefficients 
that minimizes E[(H — 1<)?] under the restriction UW che = 0 15 


| be the block covariance matrix of t. = [y' ү | where P and T 


Вск = Kee. (6.13) 


where Кс = [I — Qc]. Qc = Tc Фе (ФТ! ФС) e. Фс = U Yc, and Вс 
= Tc Wcac (the vector of coefficients of the CLGSI, see Chap. 5 for details); 
те! is the inverse of matrix Тс, and I is an identity matrix 2t x 21. When по 
restrictions are imposed on any of the traits, Uc is a null matrix and Вск = Вс 
(the vector of coefficients of the CLGSI). That is, the CRLGSI is more general 
than the CLGSI. Similar to the RLPSI and the RLGSI, matrices Кс and Ос are 
idempotent (Kc = Kz and Ос = Q2 and orthogonal (KcQc = ОсКс = 0), that 
is, Kc and Qc are projectors. Thus, we can assume that the CRLGSI has 
similar properties to those described for the RLPSI (see Chap. 3 for details) 


when matrices Vc = Ё : 


The maximized selection response and the optimized expected genetic gain per 
trait of the CRLGSI can be written as 


| and Tc — E 5 are known. 
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ki ep aes 
Ren =F ВскТсВск (6.14) 


апа 


ECL 
Li /BerTcBcr 
respectively. Although in the RLGSI and the PPG-LGSI the interval between 


selection cycles is denoted as Lc, in the CRLGSI it is denoted as L;. This is because 
the RLPSI and the CRLGSI should have the same interval between selection cycles. 


(6.15) 


6.3.2 Numerical Examples 


To illustrate the CRLGSI theoretical results, we use a real training maize (Zea mays) 
F, population with 248 genotypes (each with two repetitions), 233 molecular 
markers, and three traits: GY (ton ha 5), EHT (cm), and PHT (cm). Matrices 
P and C were estimated based on Eqs. (2.22) to (2.24) described in Chap. 2. The 
0.45 1.33 2.33 
estimated matrices were P= | 1.33 6507 83.71 and 
2.33 83.71 165.99 
0.07 0.61 1.06 
C= | 0.61 17.93 22.75 |. Ina similar manner, we estimated matrix Г using 
1.06 22.75 44.53 
Eqs. (5.21) to (5.23) described in Chap. 5. The estimated matrix was 
0.07 0.65 1.05 
Г = | 0.65 10.62 14.25 
1.05 14.25 26.37 
To estimate the CRLGSI and its associated parameters (selection response, 
expected genetic gain per trait, etc.), we need to obtain matrices Те = Е Ч 


ES 


and V. = E f| using phenotypic and genomic information and the esti- 


mated CRLGSI vector of coefficients Вск = КсВс, where Кс = П — Qc] А 
Ос = T.c(b T.) d'. $c = О.с, апа Вс = Tc сас. 

We have indicated that the main difference between the RLGSI and the CRLGSI 
is matrix Us on which we now need to impose two restrictions: one for the trait and 
another for its associated GEBV. Consider the (Zea mays) Е population described 
earlier and suppose that we restrict trait GY; then, matrix Ос. should be constructed as 
U. = 100 0 0 0 

Cc = 


Р | e 
0-0 0 о 0 If we restrict traits GY and EHT, matrix Ос should 
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100000 

be constructed as Ос, = : : | : | К ‚ etc. The procedure for obtaining 
00001 0 

matrices Kc — ГЕ Qc]. Qc = Т'Фс (ФТФ) Ф, and Фе = UW is 


similar to that described in Chap. 3. 

Let w=[5 —0.1 —0.1 0 0 0] be the vector of economic weights 
and assume that we restrict trait GY; in this case, according to the estimated matrices 
P, C. and Г described earlier, the estimated CRLGSI vector of coefficients was 
Pc = [0.076  —0.004 —0.018 2.353 —0.096 —0.082], whence the esti- 
mated CRLGSI can be written as 


Ter = 0.076GY — 0.004ЕНТ — 0.018PHT + 2.353GEBVcy — 0.096СЕВУЕнт 
— 0.082GEB Vpur 


where GEBV cy, СЕВУкнт, and СЕВУрнт are the GEBVs associated with traits 
GY, EHT, and PHT respectively. The same procedure is valid for two or more 
restrictions. 

Figure 6.3 presents the frequency distribution of the estimated CRLGSI values for 
one (Fig. 6.3a) and two null restrictions (Fig. 6.3b) using matrices Uç, and Ug,» and 
the real data set of the F, population. For both restrictions, the frequency distribution 
of the estimated CRLGSI values approaches normal distribution. 

Suppose а selection intensity of 10% ( = 1.755), matrix 
U. = | 100000 

o |000100 
[5 —0.1 —0.1 0 0 0]; then, according to the estimated matrices P. C, and P 
described earlier, the estimated CRLGSI selection response and the estimated 


CRLGSI expected genetic gain per trait were Вск = ky PorTcher = 0.96 and 
~ d 
y 
CR 


V PerTcPcr 
боек 


whereas the estimated CRLGSI accuracy was рш, = = = 0.51 (see Chaps. 3 
On 


and 5 for details). 

Now, we use the simulated data described in Chap. 2, Sect. 2.8.1 to compare 
CRLGSI efficiency versus RLGSI efficiency. The criteria for this comparison are the 
Technow inequality (Eq. 5.18, Chap. 5) and the ratio of the estimated CRLGSI 
accuracy (jj...) to the estimated RLGSI accuracy (рни„) expressed as percentages 
(Eq. 5.17, Chap. 5), i.e., р = 100(Acr — 1), where Ap = р/н» for one, two, 
and three null restrictions for five simulated selection cycles. 

Table 6.7 presents the estimated CRLGSI heritability (h2), the estimated RLGSI 


| and that the vector of economic weights is w' — 


E = kı —[0 —3.53 —6.03 0 —2.93 —4.87]respectively, 


accuracy (Pyr), the values of We = “Н L; (L; = 4), and the values of 
h 


I 
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a 
CRLGSI values for 
one restriction 


b 
CRLGSI for two 
restrictions 


Fig. 6.3 Distribution of 244 estimated combined restricted linear genomic selection index 
(CRLGSJ) values with one (a) and two (b) null restrictions respectively obtained in a real training 
population for one selection cycle in one environment 


p= 100(Acr — 1), where Яск = Pur / Pur, Nd Ри» is the estimated CRLGSI 
accuracy, for one, two, and three null restrictions for five simulated selection cycles. 


The averages of the Wc — Рн ir 14 values for one, two, and three null restrictions 


C 
were 1.26, 0.92, and 0.59 respectively, whereas the RLGSI interval length was 1.5 


(Lg = 1.5). This means that the estimated Technow inequality (Lg < рец) was 


1 
not true. Thus, for this data set, RLGSI efficiency in terms of time is not greater than 


CRLGSI efficiency. The inequality Lg < Puls ү, was not true because the estimated 
h 


1 
RLGSI accuracy was very low, whereas CRLGSI heritability was high. Thus, note 
that the averages of the estimated RLGSI accuracy for one, two, and three null 
restrictions were 0.25, 0.19, and 0.14 respectively, whereas the averages of the 
estimated CRLGSI heritability values were 0.72, 0.75, and 0.89 respectively. 
Thus, according to these results, when the estimated RLGSI accuracy is very low 
and the estimated CRLGSI heritability is high, RLGSI efficiency will be lower than 
CRLGSI efficiency in terms of time. 
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The last three columns of Table 6.7, from left to right, present the average of the 
values of р = 100 (Acn — 1), for one, two, and three null restrictions of five 


simulated selection cycles. According to these results, CRLGSI efficiency was 
53.78%, 78.25%, and 61.25% higher than RLGSI efficiency. Thus, for this data 
set, the CRLGSI was a better predictor of the net genetic merit than the RLGSI. 


6.4 The Combined Predetermined Proportional Gains 
Linear Genomic Selection Index 


In the PPG-LPSI described in Chap. 3, the vector of the PPG (predetermined 
proportional gains) was @ = [di Ф... d,]. However, because the combined 
predetermined proportional gains LGSI (CPPG-LGSI) uses phenotypic and GEBV 
information jointly to predict the net genetic merit, the vector of the PPG (а) should 
be twice the standard vector d', that is, 
ас —[di 4 >- d, dui 4-2 +++ Ф, |, where we would expect that if 
d; is the PPG imposed on trait 1, then d, , | should be the PPG imposed on the 
GEBV associated with trait 1, etc. In addition, in the CPPG-LGSI, we have three 
possible options for determining (for each trait and GEBV) the PPG, e.g., for trait 
1, di = 4„ + 1, di > d, , у, Or dj < d, + 1. This is the main difference between the 
standard PPG-LPSI described in Chap. 3 and the CPPG-LGSI. 


6.4.1 The Maximized CPPG-LGSI Parameters 


It can be shown that the vector of coefficients of the CPPG-LGSI can be written as 


Bop = Вск + Осрбсь, (6.16) 
where 


_ Bee (ФТ Фе) dc 


а„(ФТг!Фє) dc 


(6.17) 


is a proportionality constant. In addition, in Eq. (6.16), Вск = КсВс is the vector of 
д -1 
coefficients of the CRLGSI (Eq. 6.13, ёсь = Тс'Фс(Ф.Тс'Фс) ас, 
Ф/. = U Yc, and Вс = Т Ҹсас (the vector of coefficients of ће CLGSI). 
When Ось = 0, Всь = Ber, and if 0 = 0 and U. is the null matrix, then Bcr = fc. 
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Thus, the CPPG-LGSI is more general than the CRLGSI and the CLGSI, and 
includes the latter two indices as particular cases. In addition, it can be shown that 
the CPPG-LGSI has the same properties as the PPG-LPSI described in Chap. 3. 

The maximized selection response and the expected genetic gain per trait of the 
CPPG-LGSI can be written as 


kj 
nue ВсьТсВср (6.18) 


апа 


LN 
Li V ВсьТсВсь 


respectively. Although in the RLGSI and the PPG-LGSI the interval between 
selection cycles is denoted as Lc, in ће CPPG-LGSI it is denoted as Г. This is 
because the RLPSI and the CPPG-LGSI should have the same interval between 
selection cycles because they use phenotypic information to predict the net genetic 
merit. 


(6.19) 


6.4.2 Numerical Examples 


Similar to the CRLGSI, to illustrate the CPPG-LGSI results we use the 
real training maize (Zea mays) F» population with 248 genotypes, 
233 molecular markers, and three traits—GY (ton Ва '), ЕНТ (cm), and PHT 


E 0.45 1.33 2.33 К 0.07 0.61 1.06 
(cm)—where Р = | 1.33 65.07 83.71 |, С = | 0.61 17.93 22.75 |, and 
2.33 83.71 165.99 1.06 22.75 44.53 


0.07 0.65 1.05 

Г – 0.65 10.62 14.25 were the estimated matrices of Р, С, апа Г 
1.05 14.25 26.37 

respectively. 

We can obtain the estimated CPPG-LGSI vector of coefficients as Pc» = Ber < 
ба оя (Eq. 6.16). Suppose that we restrict trait GY and its associated GEBV 
100 0 0 0 
000100 
d. —[7 3.5]. In Sect. 6.32, we showed that the estimated CRLGSI vector of 
coefficients was Bor = [0.076 —0.004 —0.018 2.353 —0.096 —0.082]; 
then, we only need to calculate бев апа Ben to obtain the vector of coefficients Bcr. 


with matrix Uç, = and the vector of predetermined restriction 
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Let w = [5 —0.1 —0.1 0 0 Oj] be the vector of economic weights. It 
can be shown that 8c» — 0.00030 is the estimated value of the proportionality 
constant and бер = [0.56 —77.28 40.89 49.44 77.28 —40.89]. Thus, 
the estimated CPPG-LGSI vector of coefficients was 
Bor = [0.76 —0.030 —0.004 2.369 —0.070 —0.096], whence the esti- 
mated CPPG-LGSI can be written as 


on = 0.076GY — 0.03EHT — 0.004PHT + 2.369GEBVgy — 0.070GEBVenr 
— 0.096GEB Vpur, 


where СЕВУсу, СЕВУкнт, and СЕВУрнт are the GEBVs associated with traits 
GY, EHT, and PHT respectively. The same procedure is valid for two or more 
restrictions. Note that because Сев = 0.0003 is very small, the estimated CPPG- 
LGSI and CRLGSI values were very similar. 

Figure 6.4 presents the frequency distribution of the estimated CPPG-LGSI 
values for one (Fig. 6.4a) and two predetermined restrictions (Fig. 6.4b) using 


0 0 0 0 0 
: 7 , _|0 1 0 0 0 0 
matrices Uc, and Ос, 7lo000100V the vectors of the PPG 
0.0001 0 
a 
CPPG-LGSI for one 
restriction 
b 


CPPG-LGSI for two 
restrictions 


Fig. 6.4 Distribution of 244 estimated combined predetermined proportional gain linear genomic 
selection index (CPPG-LGSI) values with one (a) and two (b) predetermined restrictions, d = 7 and 
а = [7 —3] respectively, obtained in a real training population for one selection cycle in one 
environment 
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d. =[7 3.5] апа@с, =[7 —3 3.5 —1.5], and the real data set Fz. For both 
restrictions, the frequency distribution of the estimated CPPG-LGSI values 
approaches normal distribution. 

Suppose a selection intensity of 10% (k; = 1.755) and that we restrict trait GY and 
its associated GEBV. The estimated CPPG-LGSI selection response and expected 


genetic gain per trait were Rep = kıy PopTchcp = 0.98 and EL, =k, Bot _ 
ү Bro T Bcp 
= [0.007 —3.647 —5.760 0.004 —2.829 4.711] respectively, whereas 


O [cp 


the estimated CPPG-LGSI accuracy was Ён „ = = = 0.52. Once again, because 
OH 


бев = 0.0003, the latter results are very similar to ће CRLGSI results. 

Now, we use the simulated data described in Chap. 2, Sect. 2.8.1, to compare 
CPPG-LGSI efficiency versus PPG-LGSI efficiency. The criteria for this compari- 
son are the Technow inequality (Chap. 5, Eq. 5.18) and the ratio of CPPG-LGSI 
accuracy (PHrcp) to PPG-LGSI accuracy (Pur ) expressed as percentages (Chap. 5, 
Eq. 5.17), p = 100(Ace — 1), where ср = = Рнк»/Рни» for one, two, and three null 
restrictions in five simulated selection cycles. 

Table 6.8 presents the estimated CPPG-LGSI heritability ( h2 ), the 


estimated PPG-LGSI accuracy (Phe ), values of Иср = Рнв Lr (L; = 4) and 
1 

P = 100(Acp — 1), where Ap = Ўн„/Ён„ and Рн, is the estimated CPPG-LGSI 

accuracy, for one, two, and three null restrictions in five simulated selection cycles. 

The averages of the estimated ср values for one, two, and three predetermined 

restrictions were 3.60, 3.31, and 2.50 respectively, whereas the PPG-LGSI interval 

length | was 1.5 (Lg = 1.5). This means that the estimated Technow inequality, 


Lg < Pu © Lr, was true. Thus, for this data set, PPG-LGSI efficiency is greater 


than CPPG-LGSI efficiency in terms of time. 

The last three columns of Table 6.8, from left to right, present the values of 
p= 100(Ace — 1), for one, two, and three null restrictions in five simulated selec- 
tion cycles. The average values of p = 100 (Ace — 1) for each of the three restric- 
tions, in percentage terms, were 37.19%, 32.82%, and 37.08% respectively. This 
means that the CPPG-LGSI efficiency was greater than PPG-LGSI efficiency at 
predicting the net genetic merit. 
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Chapter 7 
Linear Phenotypic Eigen Selection Index m 
Methods 


Abstract Based on the canonical correlation, on the singular value decomposition 
(SVD), and on the linear phenotypic selection indices theory, we describe the eigen 
selection index method (ESIM), the restricted ESIM (RESIM), and the 
predetermined proportional gain ESIM (PPG-ESIM), which use only phenotypic 
information to predict the net genetic merit. The ESIM is an unrestricted linear 
selection index, but the RESIM and PPG-ESIM are linear selection indices that 
allow null and predetermined restrictions respectively to be imposed on the expected 
genetic gains of some traits, whereas the rest remain without any restrictions. The 
aims of the three indices are to predict the unobservable net genetic merit values of 
the candidates for selection, maximize the selection response, and the accuracy, 
and provide the breeder with an objective rule for evaluating and selecting several 
traits simultaneously. Their main characteristics are: they do not require the eco- 
nomic weights to be known, the first multi-trait heritability eigenvector is used as its 
vector of coefficients; and because of the properties associated with eigen analysis, it 
is possible to use the theory of similar matrices to change the direction and propor- 
tion of the expected genetic gain values without affecting the accuracy. We describe 
the foregoing three indices and validate their theoretical results using real and 
simulated data. 


7.1 The Linear Phenotypic Eigen Selection Index Method 


The conditions described in Chap. 2 for the linear phenotypic selection index (LPSI) 
are necessary and sufficient for constructing the linear phenotypic eigen selection index 
method (ESIM). The ESIM index can be written as / = Ьу, where b’ = [b, ba --- bj] 
is the unknown index vector of coefficients, Е is the number of traits, and 


у= ру у --. y]isaknown vector of trait phenotypic values. The objectives 

of ESIM are: 

1. To predict the net genetic merit Н = w'g, where р = [gj $5 ... g;] is 
the unknown vector of true breeding values for an individual and 
w=[w, w2 ... и] 15 а vector of unknown economic weights. 

© The Author(s) 2018 149 


J. J. Céron-Rojas, J. Crossa, Linear Selection Indices in Modern Plant Breeding, 
https://doi.org/10.1007/978-3-319-91223-3 7 


150 7 Linear Phenotypic Eigen Selection Index Methods 


2. To maximize the ESIM selection response and the accuracy. 

3. To select individuals with the highest H values in each selection cycle as parents 
of the next generation. 

4. To provide the breeder with an objective rule for evaluating and selecting several 
traits simultaneously. 


Although in the context of the LPSI w is a known and fixed vector of economic 
weights, in the ESIM w is fixed, but unknown and its values must be estimated in 
each selection cycle. This latter assumption is the fundamental difference between 
the ESIM and the LPSI and implies that the ESIM is more general than the LPSI. 
Thus, when w is known, the LPSI and ESIM give the same results. 


7.1.1 The ESIM Parameters 


The theoretical ESIM selection response can be written as 


Ry = kronpgr. (7.1) 


where К; is the standardized selection differential (or selection intensity), 
S / . . * "- w'Cb . . 
бн = v W'Cw is the standard deviation of H, ру = Vaca PED is the correlation, 


and w'Cb = og; the covariance between H and / respectively, o; = v b'Pb is the 
standard deviation of /, C is the covariance matrix of the true breeding values (g), 
and P is the covariance matrix of the trait phenotypic values (y). 

In the ESIM, it is assumed that А; and og are fixed, and that C and P are known; 
(w' Cb)? 
(w'Cw)(b'Pb) 
respect to vectors b and w under the restrictions e =wCw, o = b/Pb, and 0 < б, 
67<оо, where 07, = w'Cw is the variance of H = w'g and o7 = b’Pb is the variance 

of I = b’y. That is, it is necessary to maximize the function 


thus, to maximize Eq. (7.1), it is necessary to maximize Ой = with 


f(b, w, 4,6) = (w'Cb)? — u(b'Pb — 07) — ("Су 03) (72) 


with respect to b, w, и, and ф, where и and ф are Lagrange multipliers. The 
derivative results of Eq. (7.2) with respect to b, w, и, and ¢ are: 


(w'Cb)Cw — „Pb = 0, (7.3) 
(w'Cb)Cb — $Cw = 0, (7.4) 

ЬРЬ = o? and w'Cw = о, (7.5) 
respectively, where Eq. (7.5) denotes the restrictions imposed for maximizing jos It 
can be shown that w'Cb — V Ho? = o7, = 0'/?; then, Eqs. (7.3) and (7.4) can be 


written as 
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9 
0'/?Су — РЬ = 0 (7.6) 
OT 
and 
12 9 
0'^Cb— — Cw= 0, (7.7) 
бн 
1/2 
respectively. Equation (7.6) is equal to Cw = — Pb; then, vector w can be written 
Or 
as 
01/2 
We = —-C Pb. (7.8) 
oT 


By the result of Eq. (7.8), the net genetic merit in the ESIM context is Не = wzg 
„С _ vVb'Pb 
v/w;Cwgvb'Pb МЪРС РЬ 
Now, it is necessary to find the vector b that maximizes рү, у, which should be the 
ESIM index vector of coefficients. Substituting w with wg in Eq. (7.7), we get 


and the correlation between Hg and I is py,; = 


2 
„СЮ 
Cb – (weCb)’ y, = 0, (7.9) 
Отон 
j 2 
(w;Cb) 2 : : 
where -———— = ри, is the square of the correlation between ESIM and 
отон 


НЕ = Weg. Let Оны = 13, then Eq. (7.9) can be written as 
(РС — Е = 0, (7.10) 


and the optimized ESIM index is / = by. Note that in Eq. (7.10) P^!Cis the multi- 
trait heritability. By Eqs. (7.8) and (7.10), the maximized correlation between Hg 
= wg and Ig = b;y (or ESIM accuracy) can be written as 


OL, 


PHylg Б - (7.11) 
E 


where от, = 4/ Ь-РЬЕ is the standard deviation of the variance of Ig = by, and 


OH, = V b; PC !Pb; is the standard deviation of the variance of Hg = №. 


Hereafter, we write Eq. (7.11) as pg = рныг, OF Ag = Py,1,t0 simplify the notation. 

An additional restriction on Eq. (7.10) is ЪЪ = 1, because py; ;, is invariant to the 
scale change and because if by is an eigenvector of the multi-trait heritability matrix 
РІС, vector ab; is also an eigenvector of Р С for all real values of a (Mardia et al. 
1982). This means that in the ESIM the magnitude of an eigenvector is unimportant; 
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only the direction matters (Watkins 2002). Equation (7.10) can also be written as 
Cbz = АРЬЕ, which is called the generalized eigenvalue problem (Watkins 2002). 
In the latter case, bg is called a generalized eigenvector and 13 a generalized 
eigenvalue. The generalized eigenvalues may not exist; that is, they may be infinite. 
However, if P is positive definite and has the same size as C, all eigenvalues of P'C 
exist and are finite (Gentle 2007). Matrix P is symmetric and positive definite and its 
eigenvalues are different with a probability of 1 if the number of genotypes is higher 
than the number of traits (Okamoto 1973). 


If the heritability of the ESIM is № = EC 


b'Pb 


then another way of writing Eq. (7.1) 


b'Cb 
В = kiok = ki ———., (7.12) 
| С Vb'Pb 
which is similar to the univariate breeder’s equation (see Chap. 2, Eq. 2.4). All the 
parameters of Eq. (7.12) were defined earlier. 


. . . b'Cb . а 
The derivative of the ratio eG (Eq. 7.12) with respect to b can be written as 


2(b/Pb)'^Cb — (b/Pb) ""(b'Cb)Pb = 0, and, except by a proportionality constant, 
the result is 


(Pic = IX) bz = 0, (7.13) 


b; Cb 
= LET is the maximized ESIM heritability. Let 12. = р2 = А2, then 
БРЫ; Е 


Eq. (7.13) is equal to Eq. (7.10) and can be written as b-Cbg = 42b; Pbz, whence 
the maximized pe in terms of hy. is 


where А2 = 
E 


СЕ 
b, Pb; 


= (7.14) 


which should give a equivalent result to that of Eq. (7.11). 
By Eq. (7.11) and он, = y b; PC !Pb;, the maximized ESIM selection 


response and expected genetic gain per trait can be written as 


Re = k/b; Pb (7.15) 


and 


E 
ГУРЬ; 


respectively. Equations (7.15) and (7.16) do not require the economic weights to be 
known. In the original derivation of the ESIM, Cerón-Rojas et al. (2008) imposed the 


Ек = (7.16) 
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restrictions On, =1 and от, = 1. Under these restrictions, Ag = w,Cbe and 
Eq. (7.15) can be written as Re = kg. When O, #1 Eq. (7.15) is equal to 


Re = kyon, Ag, where он, = A/ b;PC "РЬЕ and 47 = ph = hr... 

Let T = Р ‘Сапа 4; = h}, ; then, Eq. (7.13) can be written аз TIbz = АТ, 
where I = Е "Е is an identity matrix of size t x t (t= number of traits), and 
F=diag{f, fı -+:: f,} is a diagonal matrix with values equal to any real 
number, except zero values. Thus, another way of writing Eqs. (7.10) and (7.13) is 


(T; — Az 1) B = 0, (7.17) 


where T; = ЕТЕ! and В = Fbz; T and T; = FTF ~! are similar matrices and both 
have the same eigenvalues but different eigenvectors (Harville 1997). When the 
F values are only 1s, vector bg is not affected; when the F values are only —1s, 
vector bg changes its direction, and if the F values are different from 1 and —1, 
matrix Е changes the proportional values of bz. In practice, bg is first obtained from 
Eq. (7.13) and then multiplied by matrix F to obtain В = Fbz, that is, f is a linear 
transformation of bz. Matrix T; = ЕТЕ! is called the similarity transformation, 
and matrix F is called the transforming matrix (Watkins 2002). Cerón-Rojas et al. 
(2006) introduced an alternative procedure for modifying the bz signs that is a 
particular case of Eq. (7.17). Vector В = ЕБ; can substitute bz in Eqs. (7.15) and 
(7.16); and in this case, the optimized ESIM index should be written as Ig = Ву. 


7.1.2 Statistical ESIM Properties 


The ratio of the index accuracies and the variance of the predicted error (VPE) are 
good criteria for comparing the index efficiencies for predicting the net genetic merit 
(see Chap. 2 for details). In Eq. (7.11), we obtained the accuracy of the ESIM; now, 
we derive the VPE of the ESIM. 

The variance of Iz = Буу (07,) and the covariance between Hg = wzg and 
Ig = Ъу(сн;) are the same, that is, 


от, = һ„РЬк апа бнр = „Ск = b; PC !Cbz = bi-Pbe, (7.18) 


respectively; that is, от, = оные. By Eq. (7.18), ће VPE of the ESIM can be written 
as 


E|(He -= Id = оң, + Of, — 2оны, = оң, — 01, = (1- pi)ou, (7.19) 
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The relative effectiveness of Ig = Бру in predicting Hg = weg is the ratio of 
(1 — pr) On, over бн, Le. 1 — pr; thus, the greater p} is, the more effective 
Ig = Буу is at predicting Hg = мре. The mean squared effect of Ig on Hz, or the 
total variance of Hg explained by Ig is 


6j, = PEO, (7.20) 


and the relative mean squared effect can be measured by B (Anderson 2003). If in 
Eq. (7.20) pz = 1, 01, = оң. and if pz = 0, от, = 0. That is, the variance of Не 
explained by J; is proportional to pi, and when p. is close to 1, or, is close to on, , and 
if pz is close to 0, er is close to 0. All these results are valid for any index associated 
with the ESIM, such as the restricted ESIM (RESIM) and the predetermined 
proportional gains ESIM (PPG-ESIM), which are described in the following sections 
of this chapter. 


7.1.3 The ESIM and the Canonical Correlation Theory 


Canonical correlation theory describes the associations between two sets of variables 
(Hotelling 1935, 1936) and searches for linear combinations, called canonical vari- 
ables, of each of two sets of variables having maximal correlation. The vector of 
coefficient of these linear combinations is called the canonical vector and the 
correlations between the canonical variables is called the canonical correlation 
(Wilms and Croux 2016). 

To see how the ESIM and the canonical correlation theory are related, note that 
vectors y and g (Eq. 7.1) can be ordered in a new vector x as x’ = [у g], whence 


Cc C 


linear combination of y(lg = b; y) and the jth linear combination of (НЕ = Wr 8) is 


А А ‚|Р С cy ; 
the covariance matrix of x is | . One measure of the association between the jth 


the jth canonical correlation (4;) value obtained from equation (rc — I) bz; = 0, 


where bz; is the jth canonical vector (j = 1, 2---, f) of matrix Р !С, and 
МЕ, = C Pb; Thus, in the canonical correlation context, lg = bz y and Hg = УЕ, 
g are canonical variables. 

In the ESIM, the first eigenvector (bz, ) of matrix P~'C should be used on 
Ig = b, y; the first eigenvalue Q2) and bz, of P^!C should be used on the ESIM 
selection response and on the ESIM expected genetic gain per trait, because, in this 
case, the ESIM has maximum accuracy compared with other indices, such as the 
LPSI. The latter results in this subsection imply that the sampling statistical properties 
associated with the canonical correlation theory are also valid for the ESIM. 
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7.1.4 Estimated ESIM Parameters and Their Sampling 
Properties 


The estimated covariance matrix of the true breeding values (C) and that of the trait 
phenotypic values (P) are denoted as C and P respectively; they can be obtained by 
restricted maximum likelihood using Eqs. Q. 22)t to (2.24) described in Chap. 2. With 
matrices С and Р, we constructed matrix Т = Р-'С and equation 


(T — 43,1) by = 0, (7.21) 


j= 1,2, -++ t, where t is the number of traits in the ESIM index. Note that pa is 
positive only if P is positive definite (all eigenvalues r and Ĉ is positive 
semidefinite (no negative р. in addition, аз ІС is ап аѕуттеїгіс 
matrix, the values of bij and 22 ву Should be oe using the singular value 
decomposition (SVD) theory (Anderson 2003). 

Matrix T is square and asymmetric of order t x t and rank q< minimum (p, c), 
where p and c denote the rank of P-! and C respectively; the rank of T is equal to 
c only if Cis square and nonsingular. Thus, matrix T has : a maximum of q eigenvalues 
different f from zero (Rao 2002). In addition, TT’ and T'T аге symmetric matrices, but 
TT' £ TT. Using the SVD theory, matrix T can be written as 


T = ViL'2v,, (7.22) 


where V; (V, V; = ViVi = = I,) and V; (V5 V5 = УУ» = 1) are matrices with the 
eigenvectors of matrices TT’ and T'T о L^? is a diagonal matrix with 
the square root of the eigenvalues az E > dnd PILLS. A; > 0) of either TT’ or TT 
(the eigenvalues of TT’ and T'T are the same). The entries 22, > 22, DE slm a 
> Oof L'? are uniquely determined, and they are called the singular values of T. The 
columns of V, are orthonormal vectors called left singular vectors of T, and the 
columns of У are called right singular vectors (Watkins 2002). 

Estimators be, and a of the first eigenvector bg, and the first eigenvalue 2, 
respectively are the first column of matrix V; and the first diagonal element of matrix 
L'”. Thus, because TT’ is a symmetric matrix, the maximum likelihood estimators 
Ж апа Б в, Ш the ESIM context can be obtained from 


(TT — f Dbz, = 0, (7.23) 


where ji; = 2 j= 1, 2, ..., t. In the asymptotic context, "n and be, are consistent 
and unbiased estimators (Anderson 2003). 


The latter results allow the ESIM index (Ig = 157] аѕ ТЕ = b Е. y to be estimated. 
The estimator of the maximized ESIM selection response and expected genetic gain 
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AN 


per trait are Re = kıy bi Pb в, and Er = = ме "ы respectively, whereas the 
b в Pbz, 
estimator of the maximized ESIM accuracy is As which should be similar to the 
estimator of the square root of the maximized ESIM heritability. 
In the asymptotic context, the estimator of bz; (bz) has multivariate normal 
distribution with expectation E (bz) = bz; and variance 


A 
р, (724) 


" 1 1 t Ad, + A2; — 242, 
Var (by) = z- bib +- (1-2) У тж. 
‘Al ( Еі — ii) 


and, for i Æ j, the covariance between bz; and b gj can be written as 


(1-25) (1 - 28) (42, + 4%) 
2 
"(Ж — à) 

where n is the number of individuals or genotypes (Anderson 1999). 'The variance of 


b gj and the covariance between b в; and b Ej depend not ony on n, but also on 
eigenvalues 42, and A. Suppose that 47. Ej > А; then, when А is very close to 1, 


Cov(b;, bg) = буйр, (7.25) 


Var (bz) Rd zm Ьер; (“denotes an approximation) and Cov (bai, bz) is very close 
to 0. By the result of Eq. (7.24), the variance of the first eigenvector (bz) of P-!C 


25 1 t 12 
can be written as Var(bzi) = 5; Ва + 1 (1 = d) У) ЕА If the 
j=2 ЕП ^Ej 


first eigenvalue 42, of P^ C is very close to 1 (42, = 1), Var(br1) = Z deb and 
Соу(Бк\, bz) А 0. 

In the asymptotic context, the jth estimator @5) of the canonical correlations has 
normal distribution with expectation E (a Fj) 22 Ag; and variance 


" 1—2. 
Var (Ag) © UN (7.26) 


n 


whereas the jth estimator of the square of the canonical correlations 42, has normal 


distribution with expectation E (2) = A5 and variance 


2 
ayy 8-25) 
үа) = ИР. (7.27) 


n 
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In addition, for i 7 j, the correlation between d and 22, is zero, i.e., Corr 

iod 72) = = 0 (Bilodeau and Brenner 1999; Muirhead EE: 

Equation (7.26) implies that under the restrictions o, = =1 and o? = 1, the 
expectation and variance of Rg = кє are E (R Е) ^ КЛЕ. and Var(R Е) 59 
ela- Y 2 
но) respectively. However, obtaining the expectation and variance of Rg = kr 
S usi Or Rr = Кг\/ b РРБ is more difficult, because in both equations there are 
two estimators: Gy and 4; in the first one, and P and bz; in the second one. 


7.1.5 Numerical Examples 


We compare ESIM efficiency versus LPSI efficiency using a real data set from 
commercial egg poultry lines obtained from Akbar et al. (1984). The estimated 
phenotypic (P) and genetic (€) covariance matrices among the rate of lay (RL, 
number of eggs), age at sexual maturity (SM, days) and egg weight (EW, kg), were 
n 240.57 —95.62 2.07 Е 29.86 17.90 —4.13 
Р = | —95.62 16720 4.58 | апас = | —17.90 18.56 1.49 | respec- 
2.07 4.58 22.80 —4.13 1.49 9.24 
tively. The number of genotypes and the vector of economic weights were n= 3330 
and w'— [19.54 —3.56 17.01] respectively, whereas the selection intensity was 
10% (kr = 1.755) for both indices. 
The estimated LPSI vector of coefficients was b; =wP 'C= 
[1.82 —1.38 3.25], whereas the estimated selection response, expected 
genetic gain per trait, accuracy, and heritability of the LPSI were 


parc Pu / 
Rs = 1.755\/ b'sPbs = 7491, Е; = 1755 PsC = [2.70 —2.20 0.84], 


\/b'sPbs 
yb sPb 
42 3 = 0.362, and #2 = b sCbs 


Ps = = = 0.143 respectively. 
Vwêw b'sPb; 

Note that because in the ESIM context b' b „Б = 1, the best way of comparing 
ESIM results versus LPSI results is when the LPSI coefficient vector is normalized, 
1.е., when the LPSI coefficient vector is equal to b. = Ds / b' sbs and then ББ 
= 1; however, it can be shown that the normalization process only affects the 
estimated LPSI selection response because in that case, Rs = 74.91 is divided by 


b sbs. For example, for this data set result, b sbs = 15.76; then, the estimated LPSI 


A AAA L A 74. 
selection response using b. — bs/b'sbs is Rs = 1574 = 4.75, whereas the rest of 


the estimated LPSI parameters are the same. When 0 < b sbs < Тапа 1 < Rs, the 
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values of Rs increase, but when 1 < b/ йу, the values of Rs decrease, as in the 
example. 


The product b sb s does not affect p. because it is invariant to scale change. Also, 


b sbs does not affect h2 and E s because b sb s appears in the numerator and 
denominator of both etimad parameters. 

In the ESIM, the sign and proportion of the expected genetic gain values for traits 
RL, SM, and EW should be in accordance with the breeder’s interest. For example, if 
the breeder’s interest is that the expected genetic gain per trait for RL should be 
positive and negative for SM, the sign and proportion of the values of the first 
eigenvector should be modified using a linear combination of the estimated first 
eigenvector br, le. f = Fb, , to achieve expected genetic gain per trait values in 
RL and SM according to the breeder's interest. 

The information needed to obtain the estimated ESIM parameters are matrices T= 


PRA 0.1102 —0.0405 —0.0280 "m 0.0146 —0.0073 —0.0338 
P^C—/|-—0.0390 0.0864 —0.0184 | and TT'— | —0.0073 0.0093 0.0041 
—0.1833 0.0517 0.4115 —0.0338 0.0041 0.2056 


We need to find the eigenvalues and eigenvectors of equation (TT -AI br,—0, 
where f; F to obtain matrices V; and L'Z, which form matrix T =V; L!/V}. 
—0.1701 0.6818 0.7115 
Matrix V; is equal to V; = | 0.0259 —0.7187 0.6948 |, whereas the diag- 
0.9851 0.1366 0.1046 
onal elements of matrix L are 0.2115, 0.0155, and 0.0025, that is, matrix 


04599 0 0 re А 
L? =| 0 0.1244 0 |. Thus б = 4 = 0.2115, 22 = 0.4599, 
0 0 0.0498 


and the estimated ESIM accuracy was Ae, = 0.6782. The estimated ESIM eigen- 
vector of coefficients is the first column of matrix Ур ie., 
bi, —[-0.1701 0.0259 0.9851], and the estimated ESIM index can be 
constructed as ТЕ = —0.1701RL + 0.0259SM + 0.9851EW. 

The estimated ESIM selection response and expected genetic gain per trait were 


b, С 
Ёк =1.755\/b' r PPr, =9.54 and Ё, = 1.755 = [-3.10 1.61 3.18] 
b' x, Poe, 


na Because the estimated LPSI selection response was 


A 


TET 
estimated LPSI response. In addition, the estimated LPSI expected genetic gain 


per trait was E; —[2.70 —2.20 0.84]. Now, suppose that the breeder's interest is to 
increase RL and decrease SM; then, Е, is a good result but E’ Е 18 s wrong. 


= 4.75, the estimated ESIM selection response was higher than the 


We can change the sign and proportion of E, by transforming b E, into в = Fb, 
-9 0 0 

using a convenient matrix Е such as F= | O 10|. In such a case 
0 0 1 
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ВЕБ, Е= [1.531 0.026 0.981], Re = 1.7554 B/PB = 4244, and Е, = 


1.755-ЁС_ = [2.990 —1.85 0.205]. However, vector p was not normalized. 
V РВ 
To normalize f we need to divide it by В'В = 3.314, but В'В should only affect 


78 = 42.44 
Ве = 42.44, which should be divided by 3.314, that is, Rg = 3314 ^ 12.806. 


According to the theory of similar matrices (Harville 1997), the estimated maxi- 
mized ESIM accuracy, ie = 0.6782, should not be affected by matrix F. 

We can compare ESIM efficiency versus LPSI efficiency to predict the net 
genetic merit using the ratio of the estimated ESIM accuracy îr, = 0.6782 to 


2 ‚_— Ap, _ 0.6782 | 
LPSI accuracy р; = 0.362, i.e., zB — = 1.873, or in percentage terms, 
Ps 0.362 


Pr = 100(1.873 — 1) = 87.3 (see Chap. 5, Eq. 5.17). According to the latter 
result, the ESIM is a better predictor of the net genetic merit and its efficiency 
is 87.396 higher than that of the LPSI for this data set. 

Now, we compare ESIM efficiency versus LPSI efficiency using the data set 
described in Sect. 2.8.1 of Chap. 2. From this data set, we ran five phenotypic 
selection cycles, each with four traits (Ту, T5, Тз, and T4), 500 genotypes, апа four 
replicates for each genotype. The economic weights for Ti, Т», Тз, and T4 were 
1, —1, 1, and 1 respectively. In this case, matrix F is an identity matrix of size 4 x 4 
for all five selection cycles. 

Table 7.1 presents the estimated LPSI, the restricted LPSI (RLPSI), and the 
predetermined proportional gain LPSI (PPG-LPST) selection response (the latter 
two for one, two, and three restrictions) for five simulated selection cycles when 
their vectors of coefficients are normalized. Table 7.1 also presents the estimated 
ESIM, the RESIM and the PPG-ESIM selection response for one, two, and three 
restrictions for five simulated selection cycles. The selection intensity was 10% 
(k, — 1.755) for all five selection cycles. In this subsection, we compare only LPSI 
results versus ESIM results. The estimated LPSI selection response when the vector 
of coefficients was not normalized was described in Chap. 2 (Table 2.4). The 
averages of the estimated LPSI and ESIM selection responses were 4.70 and 6.31 
respectively. 

Table 7.2 presents the estimated ESIM expected genetic gain per trait, accuracy (fg), 
and the values ру = 100(4¢ — 1), where Ar = Pg/Ps is the ratio of б, to the 
estimated LPSI accuracy (ру), expressed as percentages. Table 7.2 also presents the 
accuracy of the PPG-ESIM and the estimated ratio (P pg) of the estimated PPG-ESIM 
accuracy to the estimated PPG-LPSI accuracy, expressed as percentages, for one, two, 
and three predetermined restrictions for five simulated selection cycles. In this subsec- 
tion, we use only the estimated ESIM expected genetic gain per trait and Pg = 100 
(A E— 1) to compare ESIM efficiency versus LPSI efficiency. 

The estimated LPSI expected genetic gains per trait were presented in Chap. 2, 
Table 2.4. According to the results shown in Table 2.4, the averages of the estimated 
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Table 7.1 Estimated linear phenotypic selection index (LPSJ), restricted null LPSI (RLPSD, and 
predetermined proportional gains LPSI (PPG-LPSI) selection responses when their vectors of 
coefficients are normalized; estimated eigen selection index method (ESIM), restricted null ESIM 
(RESIM), and predetermined proportional gain ESIM (PPG-ESIM) selection responses for one, 
two, and three restrictions for five simulated selection cycles 


PPG-LPSI response for one, two, 
RLPSI response for one, two, | and three predetermined 
and three null restrictions restrictions 
Cycle LPSIresponse |1 2 3 1 2 3 
1 4.78 4.79 4.44 5.06 4.78 5.41 3.18 
2 4.84 4.51 4.39 5.15 4.84 5.19 3.35 
3 4.59 4.51 4.39 5.26 4.59 4.83 3.53 
4 4.80 4.15 4.06 4.71 4.80 4.96 2.64 
5 4.48 4.19 4.22 4.41 4.48 4.14 2.99 
Average | 4.70 4.43 4.30 4.92 4.70 4.91 3.14 
PPG-ESIM response for one, two, 
RESIM response for one, two, | and three predetermined 
and three null restrictions restrictions 
Cycle ESIM response |1 2 3 1 2 3 
1 8.88 4.78 4.64 4.57 8.88 7.1 7.4 
2 6.13 4.86 4.69 4.69 6.13 6.04 7.3 
3 5.44 4.96 4Л9 4.68 5.44 5.87 6.91 
4 4.84 4.30 419 4.19 4.84 4.91 5.77 
5 6.24 3.79 3.78 3.78 6.24 7.49 6.39 
Average | 6.31 4.54 4.42 4.38 6.31 6.28 6.75 


LPSI expected genetic gain per trait T1, T2, T3, and T4 for five simulated selection 
cycles were 7.26, —3.52, 2.78, and 1.58, whereas according to the results of 
Table 7.2, the averages of the estimated ESIM expected genetic gains per trait 
were 5.67, —2.67, 1.81, and 2.9 respectively. This means that the estimated LPSI 
expected genetic gain for traits T1, T2, and T3 was higher than the estimated ESIM 
expected genetic gain for those traits. 

The average of the pg = 100 (a E— 1) values was 9.76 for all five selection cycles 
(Table 7.2). The latter result is not in accordance with the LPSI and ESIM expected 
genetic gain per trait; however, note that the р; values are associated with the 
estimated LPSI and ESIM selection responses (Table 7.1), not with the expected 


з | : ^ Ре К 5 S К 
genetic gain per trait, because Ag = РЕ А du where Ак and Ку are the estimated 
Ps Rs 


ESIM and LPSI selection responses respectively. Thus, ер values indicate that the 
efficiency of the ESIM and that of the LPSI were very similar because the former 
was only 9.76% higher than the latter for this data set. 


p R 
The equality РЕ = 5. is true only when the denominators of both estimated 
Ps 5 
correlations are the same, as in the linear selection indices described in Chaps. 2—6. 
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Table 7.2 Estimated eigen selection index method (ESIM) expected genetic gain per trait, 
accuracy (Pg), and ratio of р to the estimated LPSI (data not presented) accuracy (5), expressed 


in percentage terms, Pp = 100(45 — 1) (where îr = Pr/Ps) 


ESIM expected genetic gain per trait ESIM 

Cycle Tl T2 T3 accuracy Ре Values (in %) 

1 7.81 —4.62 3.11 0.98 8.11 

2 5.15 —2.98 2.31 3.48 0.96 9.34 

3 4.74 —1.15 0.66 3.79 0.97 10.94 

4 3.94 —2.44 0.74 3.34 0.95 10.04 

5 6.68 —2.15 2.24 2.05 0.95 10.35 

Average 5.67 —2.67 1.81 2.97 0.96 9.76 
PPG-ESIM accuracies for one, two, and | 2p values (in %) for one, two, and three 
three predetermined restrictions predetermined restrictions 

Cycle 1 2 3 1 2 3 


Average 0.96 0.96 0.99 9.76 11.71 29.03 


Estimated PPG-ESIM accuracy (рр) and estimated ratio (p) of the рр to the estimated accuracy of 
the PPG-LPSI (data not presented), expressed in percentages (%), for one, two, and three 
predetermined restrictions for five simulated selection cycles 


nan 


2s PEE? = = p R 
Rg = M b/gPbz ; this means that if \/ Су 4 V w'Cw, РЕ 4 Ё For the 


Ps _ Rs 
PT A R 
Akbar et al. (1984) data, Re=9.54 and Rs = 4.75, then —=2.0 but 
p R 
E = 1.873; that is, PE RI — where “~” indicates an approximation. 
Ps Ps Rs 


Figure 7.1 presents the frequency distribution of 500 estimated ESIM values for 
cycle 2 (Fig. 7.1a) and cycle 5 (Fig. 7.1b), obtained from one selection cycle for 
500 genotypes and four traits simulated in one environment. Figure 7.1a, b indicates 
that the frequency distribution of the estimated ESIM values approaches normal 
distribution. 
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а ESIM values, 
cycle 2. 


b ESIM values, 
Cycle 5. 


Fig. 7.1 Frequency distribution of 500 estimated eigen selection index method (ESIM) values for 
(a) cycle 2 and (b) cycle 5, obtained from one selection cycle for 500 genotypes and four traits 
simulated in one environment 


7.2 Тһе Linear Phenotypic Restricted Eigen Selection Index 
Method 


Similar to the RLPSI (see Chap. 2), the objective of the RESIM is to fix r of 
t (r < f) traits by predicting only the genetic gains of (t — г) of them. Let H = w'g 
be the net genetic merit and J = b’y the ESIM index. In Chap. 2, we showed that 
Cov(I, е) = Cb is the covariance between the breeding value vector (g) and J = b/y. 
Thus, to fix r of t traits, we need r covariances between the linear combinations 
of g (U'g) and Г = Бу to be zero, i.e., Cov(I, U'g) = U'Cb = 0, where U’ is a 
matrix with Is and Os (1 indicates that the trait is restricted and O that the trait has 
no restrictions). In the RESIM, it is possible to solve this problem by maximizing 
; 0 (wCb) 
РН! ~ y CwY(b'Pb) 
U'Cb = 0, b'b = 1, w'Cw = 1, and b/Pb = 1, where w'Cw is the variance of 
Н = w'g and b/Pb is the variance of J = by. Also, the RESIM problem can be solved 


by maximizing Ime (Eq. 7.12) with respect to vectors b only under the restrictions 


U'Cb = 0 and b'b = 1, as we did to obtain Eq. (7.13). Both approaches give the 
same result, but it is easier to work with the second approach than with the first one. 


with respect to vectors b and w under the restrictions 
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7.2.1 The RESIM Parameters 


To obtain the RESIM vector of coefficients that maximizes the RESIM selection 
response and the expected genetic gain per trait, we need to maximize the function 


b’Cb 
f(b, у) = — vU'Cb 7.28a 
(Y) = ть ae 
with respect to b and У’, where v = [vi vz ++- v,-1] is а vector of Lagrange 


multipliers. The derivatives of Eq. (7.28a) with respect to b and v' can be written as 


1/2 


2(b'Pb)'"Cb — (b'Pb) ""(b'Cb)Pb — CUv = 0 (7.28b) 


and 
U'Cb = 0, (7.29) 


respectively, where Eq. (7.29) denotes the restriction imposed for maximizing 
Eq. (7.282). Using algebraic methods on Eq. (7.28b) similar to those used to obtain 
Eqs. (7.10) and (7.13), we get 


(Kec = I X.) br = 0, (7.30) 
where К = П, — Ок], I, is an identity matrix of size t x t, QR = P~ 'CU(U'CP CU) ! 
b, Cb 
U'C, and hy. = W. X is the maximized RESIM heritability obtained under the 
R R 


restriction U'Cb — 0; hy. is also the square of the maximized correlation between the 
net genetic merit and Гк = bhy, that is, hy. = А2. This means that Eq. (7.30) can be 
written as 


(KP^'C — Ap 1) be = 0. (7.31) 


Thus, the optimized RESIM index is / = Блу. The only difference between 
Eqs. (7.31) and (7.13) is matrix К. Equation (7.31) was obtained by Cerón-Rojas 
et al. (2008) by maximizing Bn (Eq. 7.1) with respect to vectors b and w under the 
restriction U'Cb = 0, ЪЪ = 1, w'Cw = 1 and b'Pb = 1 in a similar manner to the 
canonical correlation theory. The RESIM expected genetic gain per trait uses the first 
eigenvector (bg) of matrix KP- 'C, whereas the RESIM selection response uses br 
and the first eigenvalue (42) of matrix КР ‘С. When U' is a null matrix, b; = bz (the 
vector of the ESIM coefficients); thus, the RESIM is more general than the ESIM 
and includes the ESIM as a particular case. 
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In the RESIM context, vector w can be obtained (Cerón-Rojas et al. 2008) аз 


wg = С ![igPbg + Vv], (7.32) 


where Ap and b; are the square roots of the first eigenvalue (Az) and the first eigenvector 
of matrix KP~'C respectively; V = CU and у = 25! (УР) YP Chr. Let 
Нв = Wee be the net genetic merit in the RESIM context; then, because the 
correlation between Ге = b,y and Hr = wpg is not affected by scale change, Ак 
and Age can be considered proportional constants and then Фу can be written as 


Vy = (ЧР) -IWP Chr = ОСЫ, where О» is the transpose of matrix Ок 
described in Eq. (7.30). Thus, another way of writing Eq. (7.32) is 


wr = C^! [P + Q;C]b;. (7.33) 


By Eq. (7.33) and the restriction b'¥ = 0, the covariance between Ге = Блу and 
Hr = Wr (оны) can be written as 


Онык = УСО == bkPbg + b, Q, Cb; = b, Pb;, (7.34) 


where b,Q,, Cb; = 0 according to the restriction b'¥ = 0. Equation (7.34) indicates 
that the covariance between Ip and Нр (og,1,) is equal to the variance of Ip (от, = 
b; Pb). 

The maximized correlation between Jp and Нр (or RESIM accuracy) can be 
written as 


V b, Pbs 

= +, 7.35 
PH, RIR з Сук ( ) 
where w,Cwg = Oh is the variance of Нь, Wg = С! [P+ QRC] br, Qk = 
W(wP-y) wp-!, and V = CU. When U' is a null matrix, „Сук = 
b; PC Pb; = w;Cwz, the variance of Hz, and от = b, Pb = БЕРЬЕ = от, the 

variance of Гк. Hereafter, to simplify the notation, we write Eq. (7.35) as pg or Ag. 
The maximized selection response (Rr) and expected genetic gain per trait (Ej) of 


the RESIM can be written as 
Кр = kj / b, Pba (7.36) 


and 


Er = kj——— (7.37) 
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respectively, where V БРЫ = oj, is the standard deviation of the variance of 
Ig — by. If vector br is transformed as Br = Fbpr, where matrix Е was defined 
earlier, vector bs should be changed by Bp in Eqs. (7.36) and (7.37), and in Ig = Бру. 

Equation (7.36) can also be written as Rpr = kjog,Ag, where он, = 


V b, PC ! Pb, + b,PC !ОЪСЬу is the standard deviation of the variance of Hg, 


and Ак = рн, 18 the first canonical correlation between Нк = Wee and Ir = Буу. 
When он, = 1, Як is the covariance between Hr = weg and Гк = by, and then 
Eq. (7.36) can be written as Ар = krr. This last result was presented by Cerón-Rojas 
et al. (2008) in their original paper. 

The ratio of the index accuracies and the VPE are also valid in the RESIM 
context. In Eq. (7.34) we showed that the covariance between Jp = bay and Hr = 
Wr (боньт„) is equal to the variance of Ip = Dry (07). This means that ће УРЕ of the 
RESIM can be written as 


2 
Е (На — In) | = сн, + 6j, = 264 pl, = oh, — 6j, =(1- Ph) Ong: (7.38) 


Statistical properties associated with the ESIM and described in Sect. 7.1.2 are 
also valid for the RESIM. 


7.2.2 Estimating the RESIM Parameters 


We can estimate the RESIM parameters in a similar manner to the ESIM parameters 
in Sect. 7.1.4. With matrices C and P, we constructed matrix Sp = KP^!C and 
equation 


(68, —figil:) br, = 0, (7.39) 


where firj = = 1,2,..., t. The estimated RESIM index (IR = Ьу) 18 Те = bay 


and the estimator of the maximized RESIM selection response and its expected genetic 


^ [^ дс a Cb 
gain per trait can be denoted as Ар = k; bs, Pb;, and Ер = E 


V b r Phr, 


~ 


respectively, whereas the estimator of the maximized RESIM accuracy is Ag,- 


7.2.3 Numerical Examples 


We compare the RLPSI results with those of the RESIM using the Akbar et al. 
(1984) data described in Sect. 7.1.5. We restrict the trait RL (number of eggs) in both 
indices. In Chap. 3, Sect. 3.1.3, we indicated how to construct matrix U' and, in Sect. 
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3.1.4 of the same chapter, we described how to obtain matrix К = [L — Q] for one 
and two restrictions. Matrix K is the same for the RLPSI and the RESIM. Thus, in 
this subsection we omit the steps needed to construct matrices U' and K. 

First, we estimate the RLPSI parameters. Assume a selection intensity of 10% 
(К, = 1.755) and a vector of economic weights w = [19.54 —3.56 17.01]. 
The estimated RLPSI vector of coefficients for опе restriction was 
b' = [0.29 —0.84 5.78], and the estimated selection response, expected genetic 


gain per trait, accuracy, and heritability of the RLPSI were R = 1.755 V b'Pb = 
" БС __ УЪРБ > 
53.01, E' = 1.755 =[0 -071 2.96], p= = 0.26, and Л? = 


b PD | у w'Cw 


= 0.33 respectively. In this case, bb — 34.25; then, the estimated RLPSI 


b' Cb 

ЬРЬ 

selection response using the normalized RLPSI vector of coefficients was 

34.25 

In the RESIM, matrix F was an identity matrix of size 3 x 3; that is, we did not 

use matrix F to transform the RESIM vector of coefficients. In Sect. 7.1.5 we 
К 0.1102 —0.0405 —0.0280 

obtained matrix РС = | —0.0390 0.0864 — —0.0184 , and we have indi- 
—0.1833 0.0517 0.4115 

cated that matrix K is the same for the RLPSI and the RESIM. In the RESIM, we 


need matrix $, =KP~'C to solve equation (85 „ — й), = 0, where 


= 1.55, апа the rest of the estimated RLPSI parameters were the same. 


Ив; = whence we shall obtain the eigenvalues and eigenvectors that form 
matrices LU? Vai, and Sk = Vale wi. 
n "n 0 0.0285 0.0232 
For one null restriction, matrix Sg = КРС = |0 0.0620  —0.0365 
О —0.0630 0.3263 
This means that Ss reflects the trait restrictions imposed on the covariance 
between the RESIM and the vector of genotypic values; thus, if r traits are 


restricted, г | columns of Sn are equal to zero. Matrix 

"PP 0.0013 0.0009 0.0058 0.0500 0.5216 —0.8517 

SrS'r= | 0.0009 0.0052 —0.0158 | and Vr, = | —0.1446 0.8476 0.5106 |, 
0.0058 —0.0158 0.1104 0.9882 0.0976 0.1178 


whereas the Ёа =, values were 0.1130, 0.0039, and 0.0, whence 
0.3362 0 0 
0 0.0626 0 |. Thus, jig, =A% —0.1130, 2%, =0.3362, and the 
0 0 0.0 
estimated RESIM accuracy was îr, =0.5798. The estimated RESIM 
eigenvector, index, the selection response, and expected genetic gain per trait 
were Ph, = [0.0500 —0.1446 0.9882], Tr = 0.0500RL — 0.1446SM + 0.9882EW, 


112 = 
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5 Ж S br, C 
Rr=1.7554/b' r, Pbr, =9.06, and Ё„=1755——=®=—=[0 —0.72 2.96] 
у, РЬ, 


^ .01 
The estimated RLPSI selection response was R == 1.55; thus, the 


respectively. 


estimated RESIM selection response was higher than the estimated RLPSI response. 
In addition, the estimated RLPSI expected genetic gain per trait was 
E'—[0 —0.71 2.96], which is the same as the estimated RESIM expected 
genetic gain per trait. 

We can compare RESIM efficiency versus RLPSI efficiency to predict the net 
genetic merit using the ratio of the estimated RESIM accuracy 48 = 0.5798 to the 


" . dm 0.5798 | " 
RLPSI accuracy p — 0.26, i.e., zh = 026 ^ 2.23, or in percentage terms, Pg = 
p s 


100(2.23 — 1) 2 123 (see Chap. 5, Eq. 5.17). That is, the RESIM is a better 
predictor of the net genetic merit and its efficiency was 123% higher than the 
RLPSI efficiency for this data set. 

Now, we compare RESIM efficiency versus RLPSI efficiency using the simulated 
data set described in Sect. 2.8.1 of Chap. 2 for five phenotypic selection cycles, each 
with four traits (Ту, Т», Тз, and T4), 500 genotypes, and four replicates for each 
genotype. The economic weights for T, Т, Тз, and T4 were 1, —1, 1, and 1 respec- 
tively. For this data set, matrix F was equal to an identity matrix of size 4 x 4 for all 
five selection cycles. 

The first and second parts of columns 3, 4, and 5 of Table 7.1 present the 
estimated RLPSI and RESIM selection responses respectively for one, two, and 
three null restrictions for five simulated selection cycles, where the selection inten- 
sity was 10% (К, = 1.755) for all five selection cycles. The averages of the estimated 
RLPSI selection response for each null restriction were 4.43, 4.30, and 4.92, whereas 
the averages of the estimated RESIM selection response were 4.54, 4.42, and 4.38 
respectively. These results indicate that the estimated RLPSI selection response was 
greater than the estimated RESIM selection response only for three null restrictions. 

The first part of Table 7.3 presents the estimated RESIM expected genetic gain 
per trait for one, two, and three restrictions for five simulated selection cycles. The 
estimated RLPSI expected genetic gains per trait for one, two, and three restrictions 
are given in Chap. 3 (Table 3.3). According to the results shown in Table 3.3 
(Chap. 3), the averages of the estimated RLPSI expected genetic gains per trait for 
five simulated selection cycles were —2.52, 2.25, and 2.26 for one restriction; 2.84 
and 2.65 for two restrictions; and 3.90 for three restrictions. According to the results 
shown in Table 7.3, the averages of the estimated RESIM expected genetic gains per 
trait for five simulated selection cycles were —0.43, —0.75, and 3.90 for one 
restriction; —0.59 and 3.89 for two restrictions; and 3.90 for three restrictions. 
This means that the RESIM and RLPSI were the same only for three restrictions, 
whereas for one and two restrictions, the average of the estimated RESIM expected 
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Table 7.3 Estimated RESIM and PPG-ESIM expected genetic gain per trait for one, two, and three 
restrictions for five simulated selection cycles 


Estimated RESIM expected genetic gain per trait 


One null restriction Two null restrictions Three null restrictions 
Cycle Tl |T2 T3 T4 Tl |T2 |T3 T4 Tl |T2 |T3 |T4 
1 0 —0.86 | —1.22 4.14 |0 0 —0.96 |412 0 0 0 4.13 
2 0 —1.38 | —0.004 |4.31 |0 0 —0.07 |4.27 |0 0 0 4.27 
3 0 1.36 |—1.74 4.07 |0 0 —1.39 |4.09 |0 0 0 4.16 
4 0 —1.13 | —0.34 3.73 |0 0 —0.08 | 3.72 0 0 0 3:72 
5 0 —0.14 | —0.43 322 |0 0 —0.43 322 0 0 0 3.24 
Average | 0 —0.43 | —0.75 3.90 |0 0 —0.59 | 3.89 0 0 0 3.90 

Estimated PPG-ESIM expected genetic gain per trait 

One predetermined Two predetermined Three predetermined 

restriction restrictions restrictions 


Cycle Tl |T2 T3 T4 T1 T2 T3 T4 T1 T2 T3 T4 
1 7.81 | —4.62 |3.11 |2.21 |7.09 | —3.04 | 3.12 |2.76 |6.62 |—2.84 | 4.73 | 0.83 
2 5.15 | —2.98 |2.31 13.48 |5.41 |—2.32 | 2.41 |3.48 |6.14 |—2.63 | 4.39 |0.92 
3 4.74 | —1.15 |0.66 |3.79 |5.45 | —2.34] 1.24 | 3.26 |5.52 | —2.37| 3.94 | 1.35 
4 
5 


3.94 | —2.44 | 0.74 | 3.34 | 4.57 | —1.96 | 1.17 | 3.24 | 5.03 | —2.15 | 3.59 | 0.30 
6.68 | —2.15 | 2.24 | 2.05 | 6.93 | —2.97 | 2.25 |14 15.25 | -225 | 3.75 | 0.72 
Average | 5.67 | —2.67 | 1.81 | 2.97 |5.89 |—2.52 | 2.04 | 2.83 |5.71 | —2.45 | 4.08 | 0.82 


The selection intensity was 10% (k; = 1.755) and the vectors of the PPG for each predetermined 
restriction were d; = 7, d; = [7 —3]andd; = [7 —3 5] respectively 


genetic gains рег trait was higher than that of the estimated RLPSI expected genetic 
gains per trait only for trait 4. 

Figure 7.2 presents the estimated accuracy of the RLPSI and the RESIM for one, 
two, and three null restrictions for five simulated selection cycles. In all five selection 
cycles, the estimated RESIM accuracy was greater than the RLPSI accuracy. This 
means that the RESIM is a better predictor of the net genetic merit than the RLPSI. 
Additional results associated with the frequency distribution of the estimated RESIM 
values are presented in Fig. 7.3. Figure 7.3a presents the frequency distribution of 
the estimated RESIM values with one null restriction for cycle 2, whereas Fig. 7.36 
presents the frequency distribution of the estimated RESIM values with two null 
restrictions for cycle 5; both figures indicate that the estimated RESIM values 
approach normal distribution. 

Finally, in Chap. 10 we present the results of comparing the ESIM with the LPSI 
and the RESIM with the RLPSI for many selection cycles. Such results are similar to 
those obtained in this chapter. 


7.2 The Linear Phenotypic Restricted Eigen Selection Index Method 169 


RLPSI: №) restrictions RESIM: [1 3 restrictions 


Accuracy values 


1 2 3 4 5 


Сус!е 


Fig. 7.2 Estimated correlation values between the restricted linear phenotypic selection index 
(RLPSJ and the net genetic merit (Н = wg); estimated correlation values between the restricted 
eigen selection index method (RESIM) and H for one, two and three null restrictions for four traits 
and 500 genotypes in one environment simulated for five selection cycles 


a RESIM 
values, 1 null 
restriction 


b RESIM 
values, 2 null 
restrictions 


Fig. 7.3 Frequency distribution of 500 estimated RESIM values for (a) cycle 2 and (b) cycle 
5, obtained from one selection cycle for 500 genotypes and four traits simulated in one environment 


170 7 Linear Phenotypic Eigen Selection Index Methods 


7.3 The Linear Phenotypic Predetermined Proportional 
Gain Eigen Selection Index Method 


In a similar manner to the PPG-LPSI (see Chap. 3), in the PPG-ESIM the breeder 
pre-sets optimal levels (predetermined proportional gains) on certain traits before the 
selection is carried out. Let d’=[d,; 4 ··· а, | be the vector of the PPGs 
(predetermined proportional gains) imposed by the breeder on r traits and assume 
that м. is the population mean of the qth trait before selection. The objective of the 
PPG-ESIM is to change 4, to 4, + 4х, where d; is a predetermined change in p4 
(in the RESIM, d; = 0, q = 1, 2, ---, r, where г is the number of PPGs). That is, the 
PPG-ESIM attempts to make some traits change their expected genetic gain values 
based on a predetermined level, whereas the rest of the traits remain without 
restrictions. 

The simplest way to solve the foregoing problem is by maximizing the 


PPG-ESIM heritability under the restriction D'U'Cb = 0, where 
d. 0..0 -di 
р = 0 T P | : =h (see Chap. 3 for details) is a matrix (r — 1) x r, 


0 0 >> d, -d,4 
r is the number of PPGs, d, (q = 1, 2.. ., r) is the qth element of vector d, U' is the 
RLPSI matrix of restrictions of 1s and Os, and C is the covariance matrix of 
genotypic values. Matrix D' is a Mallard (1972) matrix of PPGs used to impose 
predetermined restrictions. 

The Mallard (1972) matrix of predetermined restrictions can be written as M 
= D'Y, where Ҹ = U'C and U' is the Kempthorne and Nordskog (1959) matrix of 
restrictions of 1s and Os (1 indicates that the trait is restricted, i.e., d; = 0, and О that 
the trait has no restrictions). 

To find the PPG-ESIM vector of coefficients that maximizes the PPG-ESIM 
selection response and expected genetic gain per trait, we can maximize Pa = 
(w/Cb)” 
(w'Cw)(b'Pb) 
b/b = 1, w'Cw = 1, and b/Pb = 1, where w'Cw is the variance of H = w'g and b'Pb 
is the variance of J = Бу, as did Cerón-Rojas et al. (2016) according to the canonical 


correlation theory, or we can solve this problem by maximizing um (Eq. 7.12) only 


with respect to vectors b under the restriction Mb = 0 and b/b = 1, as we did to 
obtain the RESIM vector of coefficients. Both approaches give the same result, but 
we use the latter approach because it is easier to work with. 


with respect to vectors b and w under the restrictions Mb = 0, 
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7.3.1 The PPG-ESIM Parameters 


To obtain the PPG-ESIM vector of coefficients, we need to maximize the function 


b'Cb 


Е, у’) = УМ 7.40 
(bv) = тв (740) 
with respect to vectors b and у’, where v = [vi vo +++ vı] is a vector of 


Lagrange multipliers. The derivatives of Eq. (7.40) with respect to b and v' were: 


1/2 


2(b/Pb)'" Cb — (b'Pb) '""(b/Cb)Pb — Mv = 0 (7.41) 


and 
МЪ = 0, (7.42) 


respectively, where Eq. (7.42) denotes the restriction imposed for maximizing 
Eq. (7.40). By using algebraic methods on Eq. (7.41) similar to those used to obtain 
Eq. (7.10) we get 


(КРС — A;L)b» = 0, (7.43) 


where Kp = [L — Ор Qp = P PD(D'WP PD) D'Y, W = UC, I, is an 
identity matrix f x t, 2 = hy, , and bp are the first eigenvalue and the first eigenvector 
of matrix КР 'C respectively. Note that A7, is PPG-ESIM heritability and Ap is 
the maximum correlation between Jp = Бру and Н = w'g. When D' = U’, bp = bg 
(the vector of coefficients of the RESIM), and when U’ is a null matrix, bp = bz (the 
vector of coefficients of the ESIM). That is, the PPG-ESIM is more general than the 
RESIM and the ESIM and includes the latter two indices as particular cases. 
Matrices Kp = [I, — Qp] and Ор = P^ VD(D^WP- wp) D'Y are the same as 
those obtained in the PPG-LPSI (see Chap. 3). Also, vector bp can be transformed as 
Bp = Fbp; matrix F was defined earlier. 

Let Sp = ЧРИ; then, under the assumption D'd = 0, it is possible to show 
that D(D/SpD) 'D' = 55! — S;'d(d'S;'d) d'S;' (see Chap. 3), whence by 
substituting Sp! — Sz 'd(d'S;'d) d'S;! for D'S D) 'D' in matrix Ор = P~! 
VD(D'V'P- wp) "О, matrix KpP~'C can be written as 


K,P'C- [L - PPSW |PC + Ap, (7.44) 


where Ч" = U'C, Ap = ôa , ô =Р 'Ч(Ч”Р 'Ҹ) !d, anda’ = “S *? С When Ap 
is a null matrix, КР ! C = KP” "С (matrix of the RESIM), and if U' is a null matrix, 
K,P !C = P !С (matrix of the ESIM), this means that Eq. (7.44) is a mathematical 
equivalent form of matrix KpP~'C and that Eq. (7.44) does not require matrix D’. 
The easiest way to obtain bp and Ap is to use matrix [I, — P ws wJP'C-A,in 
Eq. (7.43) instead of matrix КЪР "С. 
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In the PPG-ESIM context, vector w can be obtained as 


wp = C ![ipPb, + Мур], (7.45) 


whence Н = wg can be written as Hp = wpg. In Eq. (7.45), Ap is the 
maximum correlation between /р = Бру and Hp = wg, bp is the first eigenvector 
of matrix KpP~'C, vp = 4; (M'P-' M) "МР" СЬь, М = D'Y, and = ОС. 
In a similar manner to the RESIM context, we can assume that Ap and dp! are 
proportionality constants and it can be shown that the covariance between Jp = bhy 
and Hp = wpg (сн) is equal to the variance of Ip = bhy (от, = b, Pb»), that is, 
OHpIp = wpCbp = b, Pb;. 
The accuracy of the PPG-ESIM can also be written as 


V b; Pbp 
= —————, 7.46 
PHplp Сир Cw, ( ) 


where o}, = WpCwp = b; PC 'Pbp + b;PC ' Q;Cbpis the variance of Hp. When 
D' = U', w,Cwp = wpCwe (the variance of Hg), and when U' is a null matrix, ур 
Cwp = „Сук (the variance of Hz). Hereafter, to simplify the notation, we write 
Eq. (7.46) as pp or Ap. 

Let Bp = Fbp be the PPG-ESIM transformed vector of coefficients by matrix F. 
By Eqs. (7.1) and (7.46), the maximized selection response (Rp) and expected 
genetic gain per trait (Ep) of the PPG-ESIM can be written as 


Rp = ky p, Pp, (7.47) 
and 
р = CD» : (7.48) 
DAT 


respectively, where 4/ f, PB» = о, is the standard deviation of the variance of 


Ip = Ву. Equations (7.47) and (7.48) do not require economic weights. Каа Е 15 
Р 


V/b;Pbp 


Equation (7.47) can also be written as Кр = kjog,Ap, where oy, = 
V b, PC !Pb, + b; PC "ОСЬ is the standard deviation of the variance of Hp, 


and Др is the canonical correlation between Hp and Ip = Вьу. When og, = 1, 
Eq. (7.47) can be written as Rp = Амр, where Лр is the covariance between Гр = b, 
y and H = w;g. 

The prediction efficiency of the PPG-ESIM can be obtained in a similar manner to 
the ESIM and RESIM. The accuracy of the PPG-ESIM (Eq. 7.46) can be used to 
construct the ratio of index accuracies. The PPG-ESIM mean square error or the VPE 
can be obtained as 


an identity matrix, Bp = bp, Ip = Бру, Rp = Kj / b; Pb», and Ep = kr 
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2 
E|(Hp — Ip) | = on, + от, — 20ныь = oh, — or, =(1- Pp) On, (7.49) 


Additional properties associated with the ESIM are also valid for the PPG-ESIM. 


7.3.2 Estimating PPG-ESIM Parameters 


The procedure used to estimate PPG-ESIM parameters is the same as that described 
for RESIM. Let © and P be the estimated matrices of C and P. In the PPG-ESIM 
context, we use matrix S=K pP- ІС to obtain the estimated eigenvalues and 
eigenvectors of equation 


(S — 23,1.) bp; = 0, (7.50) 


j = 1, 2, +: t, where f is the number of traits in ће PPG-ESIM index, 

Kp = [L -Q Р], І, is an identity matrix of size t х t and 

zt AT —— үл vcl = за 

Qp =P "VD(DWP VD) О. As $ is an asymmetric matrix, the values of 

bp; and ЛЬ should be obtained using SVD (singular value decomposition). 
According to SVD, we need to solve equation 


(SS' — i L)b», = 0, (7.51) 


where £i рј = = 44 P, (j J= 1, 2, ..., д. By Eq. (7.51), the estimated PPG-ESIM index 


(Ip = Бру) is T. в = =p p,y. The estimator of the maximized PPG-ESIM selection 
response, and its expected genetic gain per trait, can be denoted as 


> Cb 
Rp = kr b p Pp, and Ep = kk 
V b' », Php, 


the maximized accuracy of the PPG-ESIM is A P 


respectively, whereas the estimator of 


7.3.3 Numerical Examples 


We compare the results of the PPG-LPSI and the PPG-ESIM using the Akbar et al. 
(1984) data described earlier. We restrict traits RL and SM, on both indices using the 
PPG vector d' = [3 —1]. In Chap. 3, Sect. 3.1.4, we indicated how to construct 
matrix U' and, in Sect. 3.2.4 of the same chapter, we described how to obtain matrix 
Kp for one and two restrictions. Matrix Kp is the same for the PPG-LPSI and the 
PPG-ESIM. Thus, we omit the steps for constructing matrices U' and Kp. 

Assume a selection intensity of 10% (k; = 1.755) and that the vector of economic 
weights is w' = [19.54 —3.56 17.01]. The estimated PPG-LPSI vector of coeffi- 


^ 


cients for two predetermined restrictions was b' = [1.70 1.04 2.93], and its 
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estimated selection response, expected genetic gain per t trait, accuracy, and heritability 
b'C 


Vb'Pb 


= 0.12 respectively. In this case, 


were В = 1.755 b Pp = 49.02, E'— 1.755 = = [1.25 —0.42 1.36], 


ЫБ = 12. 57; then, the estimated PPG-LPSI selection response using the normalized 


49.02 
PPG-LPSI vector of coefficients was R = — == = 3.90, whereas the rest of the 


estimated PPG-LPSI parameters were the same. 

In the PPG-ESIM, we need matrix $ = K,P-!C to obtain the eigenvalues 
and eigenvectors of (55' — fipL)b», =0 that make up matrices LY a Vp, 
ad S= Vp, LY у. where fip; zu. It can be shown that S= 
HN 0.1047 —0.0349 —0.0279 i 0.0130 0.0085 —0.0344 
KP !C—| 0.0678 —0.0226 —0.0213|, SS'— | 0.0085 0.0056 —0.0236|, 


—0.1970 0.0657 0.4119 —0.0344 —0.0236 0.2118 
—0.1663 0.8292 0.5336 
and Ур = | —0.1138 0.5214 —0.8457 |, whereas the ъ= values were 0.2214, 
0.9795 0.2014 —0.0076 
0.4705 0 0 
0.0099, and 0.0, whence L?=| 0 0.0997 0 |. Thus, fp, =4}, =0.2214, 
0 0 0.0 
А5. = 0.4705, and the estimated maximized PPG-ESIM accuracy was 29 — 0.6859. 
We transformed the first eigenvector bi, = [-0.1663 —0.1138 0.9795] using 
-9 0 0 
matrix F — | 0 1 0 | to obtain vector Bp = b, F= [1.4968 —0.1138 0.9795] 
0 0 I 


and В.В, = 3.21, whence the estimates of the index, the selection response, and 
expected genetic gain per trait of the PPG-ESIM were Ip = 1.4968RL — 0.1138$М + 


1.755\/ВЫРВь 43.01 Й В,С 


0.9795EW, Rp = = = = 3 51 = 13.39, апа E = 
ВВР y P; PD, 


05 —1.96 0.19] respectively. The estimated PPG-LPSI selection response was 

49.02 

12.57 

greater than the estimated PPG-LPSI response. 

We compared PPG-ESIM efficiency versus LPSI efficiency to predict the net 

genetic merit using the ratio of the estimated PPG-ESIM accuracy (Ap, — 0.6859) to 
E 0.6859 

PPG-LPSI accuracy (р = 0.24), i.e ^ — 024 = 2.858 or, in percentage terms, 
р { 

Pp = 100(2.858 — 1) = 185.80. Then, the PPG-ESIM was a better predictor of the 

net genetic merit and its efficiency was 185.80% higher than that of the PPG-LPSI 

for this data set. 


[3. 
R= = 3.90, which means that the estimated PPG-ESIM selection response was 
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Now, we compare PPG-ESIM efficiency versus PPG-LPSI efficiency using the 
data set described in Sect. 2.8.1 of Chap. 2 for five phenotypic selection cycles, each 
with four traits (Ту, T5, T3, and T4), 500 genotypes, and four replicates for each 
genotype. The economic weights for Тү, Т, Тз, and T4 were 1, —1, 1, and 1 respec- 
tively. For this data set, matrix F was an identity matrix of size 4 x 4 for all five 
selection cycles. 

The first and second parts of columns 6, 7, and 8 in Table 7.1 present the 
estimated PPG-LPSI and PPG-ESIM selection responses for one, two, and three 
predetermined restrictions for five simulated selection cycles. The selection intensity 
was 1096 (k;= 1.755) and the vectors of PPG for each predetermined restriction were 
d =7, d,=[7 —3], and d} = [7 —3 5] respectively, for all five selection 
cycles. The estimated PPG-LPSI selection response when the vector of coefficients 
was not normalized was presented in Chap. 3 (Table 3.5). The averages of the 
estimated PPG-LPSI selection response for each predetermined restriction were 
4.70, 4.91, and 3.14, whereas the averages of the estimated PPG-ESIM selection 
response were 6.31, 6.28, and 6.75 respectively. These results indicate that the 
estimated PPG-ESIM selection response was greater than the estimated PPG-LPSI 
selection response for all predetermined restrictions. 

The second part of Table 7.2 presents the estimated PPG-ESIM accuracy (P p) and 
the ratio of рр to the estimated PPG- LPSI accuracy (р), expressed in percentage 
terms, pp = 100 (Ap = 1), where Ap = = 0/0, for one, two, and three predetermined 
restrictions for five simulated selection cycles. The estimated PPG-LPSI accuracies 
were presented in Chap. 3 (Table 3.6). The average estimated PPG-ESIM efficiency 
for each restriction was 9.76%, 11.71%, and 29.03% greater than the PPG-LPSI 
efficiency for this data set in all five selection cycles. 

The second part of Table 7.3 presents the estimated PPG-ESIM expected genetic 
gain per trait for one, two, and three predetermined restrictions for five simulated 
selection cycles. The estimated PPG-LPSI expected genetic gains per trait for one, 
two, and three predetermined restrictions were presented in Chap. 3, Table 3.5, 
where it can be seen that the averages of the estimated PPG-LPSI expected genetic 
gains per trait for five simulated selection cycles were 6.85, —3.25, 2.62 and 1.48 for 
one restriction; 6.93, —2.97, 2.65 and 1.45 for two restrictions; and 5.20, —2.23, 
3.72 and 1.43 for three restrictions, whereas for the same set of restrictions, the 
averages of the estimated PPG-ESIM expected genetic gain per trait were 5.67, 
—2.67, 1.81, and 2.97 for one restriction; 5.89, —2.52, 2.04, and 2.83 for two 
restrictions; and 5.71, —2.45, 4.08, and 0.82 for three restrictions (Table 7.3). 
Because the vectors of predetermined proportional gains for each predetermined 
restriction were dj = 7, 4 = [7 —3],andd; = [7 -—3 5], the averages of the 
estimated PPG-LPSI expected genetic gains per trait were closer than those of the 
estimated PPG-ESIM expected genetic gains per trait for one and two predetermined 
restrictions, whereas for three restrictions, the results of both selection indices were 
similar. 
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Chapter 8 м 
Linear Molecular and Genomic Eigen ciim 
Selection Index Methods 


Abstract The three main linear phenotypic eigen selection index methods are the 
eigen selection index method (ESIM), the restricted ESIM (RESIM) and the 
predetermined proportional gain ESIM (PPG-ESIM). The ESIM is an unrestricted 
index, but the RESIM and PPG-ESIM allow null and predetermined restrictions 
respectively to be imposed on the expected genetic gains of some traits, whereas the 
rest remain without any restrictions. These indices are based on the canonical 
correlation, on the singular value decomposition, and on the linear phenotypic 
selection indices theory. We extended the ESIM theory to the molecular-assisted 
and genomic selection context to develop a molecular ESIM (MESIM), a genomic 
ESIM (GESIM), and a genome-wide ESIM (GW-ESIM). Also, we extend the 
RESIM and PPG-ESIM theory to the restricted genomic ESIM (RGESIM), and to 
the predetermined proportional gain genomic ESIM (PPG-GESIM) respectively. 
The latter five indices use marker and phenotypic information jointly to predict the 
net genetic merit of the candidates for selection, but although MESIM uses only 
statistically significant markers linked to quantitative trait loci, the GW-ESIM uses 
all genome markers and phenotypic information and the GESIM, RGESIM, and 
PPG-GESIM use the genomic estimated breeding values and the phenotypic values 
to predict the net genetic merit. Using real and simulated data, we validated the 
theoretical results of all five indices. 


8.1 The Molecular Eigen Selection Index Method 


The molecular eigen selection index method (MESIM) is very similar to the linear 
molecular selection index (LMST) described in Chap. 4; thus, it uses the same set of 
information to predict the net genetic merit of individual candidates for selection, 
and therefore needs the same set of conditions as those of the LMSI. The only 
difference between the two indices is how the vector of coefficients is obtained and 
the assumption associated with the vector of economic weights. Thus, although the 
LMSI obtains the vector of coefficients according to the linear phenotypic selection 
index (LPST) described in Chap. 2 and assumes that the economic weights are known 
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and fixed, the MESIM assumes that the economic weights are unknown and fixed 
and obtains the vector of coefficients according to the ESIM theory. 


8.1.1 The MESIM Parameters 


In the MESIM context, the net genetic merit can be written as 


Н = wig + ws = [wi IH = wa, (8.1) 
where g' =[g, ... g,]isthe vector of true breeding values, f is the number of 
traits, wi = [и -.. wi] is a vector of unknown economic weights associated 
with g, w, —[0; --- ©] is a null vector associated with the vector of marker 
score values $' = [51 s2 ... sw = [м wj]anda'—[g $] (Chap. 4 for 


details). The MESIM index can be written as 


I- py +8is=[B, №] |2 =p (8.2) 


where у’ = [у -++ у, |15 the vector of phenotypic values; $ = [51 s2 ... s] 
is the vector of marker scores; B’, and В, are vectors of phenotypic and marker score 
weight values respectively, В' = [By Вс | апае = [y  s']. The objectives of the 
MESIM are the same as those of the ESIM (see Chap. 7 for details). 
Let Var(H) = w'Wyw = 07, be the variance of H, Var(I) = В"ТиВ = o? the 
variance of J, and Соу(Н, Г) = w'W,f the covariance between H and Г, where 
= gil _ С Sy = ур | Р Sy T 
Уу = LIH = by =| and Ty = var | = [ NM are block matri 
ces of size 2t x 2t (t is the number of traits) of covariance matrices where P, $, and 
С are covariance matrices £ x t of phenotypic (y), marker score (s), and genetic 


(А 
J р 
breeding (g) values respectively. Let руу = xum and № = TES be the 
умум м M 


correlation between H and J, and the heritability of J respectively; then, the MESIM 
selection response can be written as 


and 
В = Ио, (8.4) 


where Ку is the standardized selection differential (or selection intensity) associated 
with MESIM; он = WwW yw and o; = \/В"ТмВ are the standard deviations of the 


8.1 Тһе Molecular Eigen Selection Index Method 179 


variance of H and / respectively. It is assumed that kz is fixed, and that matrices Тм 
апа Wy, are known; therefore, we can maximize R by maximizing pj; (Eq. 8.3) 
with respect to vectors w and В, or by maximizing № (Eq. 8.4) only with respect to 
vector В. 

Maximizing № only with respect to В is simpler than maximizing рн with respect 
to w and В; however, in the latter case the maximization process of рну gives more 
information associated with MESIM parameters than when № is maximized only 
with respect to В (see Chap. 7, Eq. 7.13, for details). In this subsection, we maximize 
Рні With respect to vectors w and В similar to the ESIM in Chap. 7, Sect. 7.1.1. Thus, 
we omit the steps and details of the maximization process of py. 
эмб with respect to vectors w and В under the 


Му, 
restrictions 07, = ww, o? = В'ТВ, and 0 < оўу, 07 < oo, where o4 is the variance of 
Н = w'a and в? is the variance of J = f)'t. Thus, it is necessary to maximize the 
function 


We maximize pg; = 


РОВ. w, и, d) = wp — 0.5и(В'ТВ — о?) — 0.55 (ww — о?) (8.5) 


with respect to В, w, и, and ф, where и and ф are Lagrange multipliers. The 
derivatives of Eq. (8.5) with respect to В, w, и, and $ are: 


Ww — иТВ — 0, (8.6) 
Wp — dw = 0, (8.7) 
ВТВ =o; and w'"Pw-o;, (8.8) 


respectively, where Eq. (8.8) denotes the restrictions imposed for maximizing рну. It 
can be shown (see Chap. 7) that vector w can be obtained as 


wy = Чу Typ (8.9) 


and the net genetic merit in the MESIM context can be written as Hy = Ума; thus, 


: З V/B'TB 
the correlation between Hy = w;,aand J isp ниг = Vere and the MESIM vector 


of coefficients (В) that maximizes ру can be obtained from equation 


(Tw — Дм) By = 0, (8.10) 


where 15, is an identity matrix of size 2t x 2t (tis the number of traits), and P and By 
are the eigenvalue and eigenvector of matrix Ty Фи. The words eigenvalue and 
eigenvector are derived from the German word eigen, which means owned by or 
peculiar to. Eigenvalues and eigenvectors are sometimes called characteristic values 
and characteristic vectors, proper values and proper vectors, or latent values and 
latent vectors (Meyer 2000). The square root of p (Ям) is the canonical correlation 
between Hy = wya and Jy = Вмь and the optimized MESIM index can be written 
as Тм = i,t. Using a similar procedure to that described in Chap. 7 (Eq. 7.17), it can 
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be show that vector Вы can be transformed into Вс = ЕВм, where F is a diagonal 
matrix with values equal to any real number, except zero values. 

The maximized correlation between Hy = Wya and Im = Byt, or MESIM 
accuracy, is 


А/ В, ТмВ ОТ 
(ным = "== =", (8.11) 
\/ВмТмЧ у Tupy "P" 


where oj, = \/ByTuBy is the standard deviation of Im = Byt, and он, = 


v By Tuy Tupy is the standard deviation of Hy = wya. 


The maximized selection response and expected genetic gain per trait of MESIM 


are 
Км = ki Pu, Tuo, (8.12) 


and 


Tuy, 
y Bu, ТмВм, 


respectively, where By, is the first eigenvector of matrix Ty Vy. If vector By, is 
multiplied by matrix Е, we obtain Вс, = Ff), ; in this case, we can replace By, with 
Bc, = FBy, in Eqs. (8.12) and (8.13), and the optimized MESIM index should be 
written as Гу = Вс y. 


(8.13) 


8.1.2 Estimating MESIM Parameters 


We estimate the MESIM parameters using the same e procedure described in Chap. 7 
(Sect. 7.1.4) to estimate the ESIM parameters. Let С.Р ‚ and $ м be the estimates of 
the genotypic, phenotypic, and marker scores covariance matrices, 


T. = Р м and Фи = Е 8м | the estimated block matrices (Chap. 4) 
Su Su Su 8м 


A 


and W = Т x Фм; then, to find the estimators Г] м, and 12 m, ОЁ the first eigenvector 
(Вм,) and the first eigenvalue (47 мү) respectively, we need to solve the equation 


(WW’ — ивы, = 0, (8.14) 


where Hj = = 1,2,..., 2t. For additional details, see Eqs. (7.22) and (7.23), 
and Sect. 7.1.5 of Chap. 7. The result of Equation (8.14) allow the MESIM index 
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(Im = P, t) to be estimated as Ти = p м. whereas the estimator of the maxi- 
mized ESIM selection response and its expected genetic gain per trait can be 


denoted by 
А а x LAT] 
Ru = kiy Pu Tua, and Ем = И N (8.15) 
B м, Tubu, 


respectively. 


8.1.3 Numerical Examples 


To validate the MESIM theoretical results, we use a real maize (Zea mays) Е 
population with 247 genotypes (each with two repetitions), 195 molecular markers, 
and two traits—plant height (PHT, cm) and ear height (ЕНТ, cm)—evaluated in one 
environment. We coded the marker homozygous loci for the allele from the first 
parental line by 1, whereas the marker homozygous loci for the allele from the 
second parental line was coded by —1 and the marker heterozygous loci by 0. The 
estimated phenotypic, genetic, and marker scores covariance matrices were 
CN In | A [sr zd iui oc E 0.983 | 
106.89 167.93 |? 57.44 59.80 |’ 0.983 28.083 

respectively, and the vector of economic weights was а’ = |w" 0'], where w = 
[7-1 -1]and0 = [0 0]. Details of how to estimate the marker scores and their 
variance were given in Chap. 4. 

We compare LMSI versus MESIM efficiency. The estimated LMSI vector of 
coefficients was В' = а Т! =[-0.59 —0.18 —0.41 —0.82]. Using a 
1096 selection intensity (k; — 1.755), the estimated LMSI selection response and 


the expected genetic gain per trait were R -kw p Tup = 20.41 and 


>, Фи 
Е =k =[-10.09 —10.31 —2.53 —4.39] respectively, whereas the esti- 
Тир 
mated LMSI accuracy was р ~= ы = 0.72. 
Ш OH 


Vector P^, = [0.089 —0.061 —0.536 0.837] was the original estimated 
—0.1 0 0 0 

0 —01 0 0 

0 0 0.75 0 ? 

0 0 0 —0.75 
уесюг P^, was transformed as Bi, = Ви Е = |—0.009 0.006 —0.402 0.628] 
and then the estimated MESIM index was du = —0.009 PHT + 0.006 EHT — 
0.402 Spur + 0.628 Зент, where Spyr and Зент denote the marker scores 
associated with PHT and EHT respectively. The estimated MESIM expected 


MESIM vector of coefficients. Using matrix F = 
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genetic gain, selection response, and accuracy were Е’, = К Lu 
В'с Tube, 
[-3.438 —8.516 -3319 -8.372], Ru = ky с Pupe, = 6.573 апа 


р = a = 0.99 respectively. 
Hulu Они 

The inner product of the estimated LMSI and MESIM vector of coefficients were 
1.221 and 0.556 respectively, whence the estimated LMSI selection response (20.41) 
divided by 1.221 was 16.716, and the estimated MESIM selection response (6.573) 
divided by 0.556 was 11.821. That is, the estimated LMSI selection response was 
higher than the estimated MESIM selection response for this data set. Similar results 
were found when we compared the estimated LMSI expected genetic gain per trait 
with the estimated MESIM expected genetic gain per trait. Finally, Fig. 8.1 presents 
the frequency distribution of the 247 estimated MESIM values for the real data set 
described earlier, which approaches normal distribution, as we would expect. 

Now with a selection intensity of 10% (k; = 1.755), we compare the LMSI and 
MESIM efficiency using the simulated data set described in Sect. 2.8.1 of Chap. 2 for 
four phenotypic selection cycles, each with four traits (Ту, 75, Тз and T4), 500 geno- 
types, and four replicates of each genotype. The economic weights for Ту, Т, Тз, and 
T4 were 1, —1, 1, and 1 respectively. For this data set, we did not use the linear 
transformation Pc, = ЕВ. 

The estimated selection responses of the linear marker, combined genomic апа 
genome-wide selection indices (LMSI, CLGSI, and GW-LMSI respectively; see 


MESIM frequency distribution values 


Fig. 8.1 Frequency distribution of 247 estimated molecular eigen selection index method 
(MESIM) values for one selection cycle in an environment for a real maize (Zea mays) F 
population with 195 molecular markers and two traits, plant height (PHT, cm) and ear height 
(ЕНТ, cm), and their associated marker scores Ѕрнт and Зент respectively 
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Chaps. 4 and 5 for details) for four simulated selection cycles when their vectors of 
coefficients were normalized, are presented in Table 8.1. Also, in this table the 
selection responses of the estimated linear molecular, genomic, and genome-wide 
eigen selection index methods (MESIM, GESIM, and GW-ESIM respectively; 
details in Sect. 8.2) are shown for four simulated selection cycles. The average of 
the estimated LMSI selection response was 2.22, whereas the average of the 
estimated MESIM selection response was 1.69. The estimated LMSI selection 
response was higher than that of the MESIM. 

Table 8.2 presents the estimated LMSI and MESIM expected genetic gains for 
four traits (T1, T2, T3, and T4) and their associated marker scores (S1, S2, S3, and 
S4) for four simulated selection cycles. The averages of the estimated LMSI 


Table 8.1 Estimated linear molecular, combined genomic, and genome-wide selection index 
(LMSI, CLGSI and GW-LMSI respectively) selection responses when their vectors of coefficients 
are normalized for four simulated selection cycles 


Estimated selection response 
Cycle LMSI CLGSI GW-LMSI MESIM GESIM GW-ESIM 
1 0.02 1.24 0.93 0.50 3.95 0.73 
2 4.94 0.80 0.80 1.21 3.07 1.06 
3 3.69 0.34 0.93 3.91 2.05 0.77 
4 0.23 0.35 0.83 1.15 1.90 1.14 
Average 2.22 0.68 0.87 1.69 2.74 0.93 


Estimated linear molecular, genomic, and genome-wide eigen selection index method (MESIM, 
GESIM, and GW-ESIM respectively) selection responses for four simulated selection cycles. The 
selection intensity was 10% (k; = 1.755) 


Table 8.2 Estimated linear molecular selection index (LMSI) and estimated linear molecular eigen 
selection index method (MESIM) expected genetic gains for four traits (T1, T2, T3, and T4) and 
their associated marker scores (S1, S2, S3, and S4) for four simulated selection cycles. The selection 
intensity was 10% (k; = 1.755) 


Estimated LMSI expected genetic gain 


Traits Marker scores 
Cycle Tl T2 T3 T4 $1 $2 S3 S4 
1 24.48 —0.01 0.74 —0.87 4.18 —1.14 0.72 0.79 
2 7.14 —3.39 2.62 1.55 3.78 —2.30 1.02 1.37 
3 9.17 —3.04 1.87 1.21 6.22 —1.51 1.02 0.26 
4 10.16 —1.95 1.17 1.88 8.63 —3.83 0.09 0.13 
Average 12.74 —2.10 1.60 0.94 5.70 —2.19 0.71 0.64 
Estimated MESIM expected genetic gain 
Traits Marker scores 
Cycle Tl Т2 T3 T4 S2 S3 S4 
1 27.48 2.60 —1.03 —2.64 0.00 —0.04 —0.43 
2 8.82 —4.75 0.37 2.11 4.09 0.38 —2.76 
3 9.83 1.74 0.72 0.37 1.76 0.31 0.34 
4 11.47 —1.13 —1.64 1.53 8.66 —3.96 —1.47 0.04 


Average 14.40 —0.38 —0.39 0.34 8.65 0.47 —0.21 —0.70 
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expected genetic gains for the four traits and their associated marker scores were 
12.74, —2.10, 1.60, 0.94, 5.70, —2.19, 0.71, and 0.64 respectively, whereas the 
averages of the estimated MESIM expected genetic gains for the four traits and their 
associated marker scores were 14.40, —0.38, —0.39, 0.34, 8.65, 0.47, —0.21, and 
—0.70 respectively. Except for trait T1 and its associated molecular scores, the 
estimated LMSI expected genetic gains per trait were higher than the estimated 
MESIM expected genetic gains. Thus, for this data set, LMSI efficiency was greater 
than MESIM efficiency. 

Chapter 11 presents RIndSel, a user-friendly graphical unit interface in JAVA 
that is useful for estimating the LMSI and ESIM parameters and selecting parents for 
the next selection cycle. 


8.2 The Linear Genomic Eigen Selection Index Method 


The linear genomic eigen selection index method (GESIM) is based on the standard 
CLGSI described in Chap. 5, and uses genomic estimated breeding values (GEBVs) 
and phenotypic values jointly to predict the net genetic merit. Thus, conditions for 
constructing a valid GESIM are the same as those for constructing the CLGSI. Also, 
the MESIM theory described in Sect. 8.1 is directly applied to the GESIM and only 
minor changes are necessary in GESIM theory. For example, instead of marker 
scores, the GESIM uses GEBVs to predict the net genetic merit; thus, the details of 
the estimation process are the same as for the MESIM. 


8.2.1 The GESIM Parameters 


In the GESIM context, the net genetic merit can be written as 


Н = wig + wy = Гу lH =wa, (8.16) 
where g' = |g; ... 8, | 15 the vector of true breeding values, t is the number of 
traits, у! = [ил ++- wp] is a vector of unknown economic weights associated 
with g, LA —[0; --- Oj]isa null vector associated with the vector of genomic 
breeding values ү = [у р ... у], м = [м №], апа о =[ ту]. The 


estimator of y is the GEBV (see Chap. 5 for additional details). The СЕЅІМ index 
can be written as 


1= Bly + Bly = [B ДИ = ptt, (8.17) 
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where у’ = [у +-+- у, | 1$ the vector of phenotypic values; В, and В, are vectors of 
weights of phenotypic and genomic breeding values weights respectively; 
В =[В, В] аағ = [у ту]. 

Let Var(H) = w'Aw = вр be the variance of Н = w'a, Var(I) = f/f = o? ће 
variance of J = В’, and Cov(H, Г) = W'AP = og; the covariance between H and Г, 


_ gl IC r _ У |Р Г : 
where A= Var 8] = |р | сасна Г are block matrices 


2t x 2t (t is the number of traits) of covariance matrices and P, Г, and C are 
covariance matrices of phenotypic (y), genomic (у), and genetic (2) values respec- 


tively. Then, pg; = Teun en is the correlation between Н = w'« and / = f9 'f and 


the GESIM selection response can be written as 
R= kionpgr. (8.18) 


where Ку is the standardized selection differential (or selection intensity) associated 
with the GESIM and он = V w'Aw is the standard deviation of the variance of H. It 
is assumed that Ку is fixed, and that matrices Ф апа A are known; then, we can 
maximize К by maximizing рну with respect to vectors w and f) under the restrictions 
o, = wAw, сі = В'ФВ, and 0 < о, o? « oo; similar to the MESIM. 

It can be shown that the vector w in the GESIM context is 


wo = АФВ (8.19) 


and that the net genetic merit can be written as Нс = \ се. The correlation between 


Hg = мса and I = fff is рии = Tver err and the GESIM index vector of 


coefficients that maximizes py, can be obtained from the equation 


(Ф'А-ЖЬ Вс = 9, (8.20) 


where L, is an identity matrix of size 2t x 2t (tis the number of traits); the optimized 
GESIM index can be written as /с = p.f. By Eqs. (8.19) and (8.20), GESIM 
accuracy can be written as 


О] 


PHglg = E (8.21) 
G 


where or, = \/ВсФВс is the standard deviation of Ic = ВСЁ, and он, = 
\/ВСФА '@B, is the standard deviation of Hg = wea. In Eq. (8.20), 4% = 


Pirs 16 1$ the square of the canonical correlation between Hg and Ig, and Bg is the 
canonical vector associated with 42. = рү. 
G Нос 
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The maximized GESIM selection response and expected genetic gain per trait are 


Re = К \/ВсФВс (8.22) 


апа 


Eg = 86 (8.23) 


ENTE 7 


respectively, where Вс is the first eigenvector of matrix Ф 'А. Vector Вс can be 
transformed as fcc = ЕВс, where F is a diagonal matrix defined earlier. 


8.2.2 Numerical Examples 


To compare the CLGSI versus GESIM theoretical results, we use a real maize 
(Zea mays) F> population with 244 genotypes (each with two repetitions), 233 
molecular markers, and three traits—grain yield (GY, ton ha^! ear height (EHT, 
cm), and plant height (PHT, cm). We estimated matrices P and C using Eqs. (2.22) 
to (2.24) described in Chap. 2, whence the estimated matrices were 


_ [0.45 133 233 . [0.07 061 1.06 
P= | 1.33 6507 $8371| and = |0.61 17.93 22.75 |. In а similar 
2.33 83.71 165.99 1.06 22.75 44.53 


manner, we estimated matrix Г by applying Eqs. (5.21) to (5.23) described in 

Chap. 5 using phenotypic and marker information jointly; the estimated matrix 
n 0.07 0.65 1.05 

was Г = | 0.65 10.62 14.25 |. The selection intensity for making a selection 
1.05 14.25 26.37 

cycle was 10% (k; = 1.755) and the vector of economic weights was 

w=[5 —0.1 -0.1 0 0 0]. To obtain the estimated vector of coefficient 

of CLGSI @ = Ф-—!Аз) апа GESIM (Eq. 8.20), it is necessary to construct matrices 


А — С r ad= |E r| 
r r r r 
The estimated CLGSI vector of coefficients for the traits GY, EHT, and 
PHT and their associated GEBVs (GEBVGy, СЕВУкнт, and GEBVppr respec- 


tively) was p = [0.08 —0.02 —0.01 4.92 —0.08 —0.09], whereas the 
estimated CLGSI selection response, accuracy, and expected genetic gain per 


| m A " ё; Ре ВА 
trait меге R = kıy В'ФВ = 1.54, py, = — = 0.814, and Е =k; = 
бы Е 
pop 


[0.36 1.04 1.70 0.36 1.53 2.38 ]respectively. Finally,7 = 0.08GY — 0.02 
ЕНТ — 0.01PHT + 4.92СЕВУсу — 0.08СЕВУкнт —0.090ЕВУрнт was the esti- 
mated CLGSI. 
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The estimated GESIM vector of coefficients, selection response, accuracy, 


and expected genetic gain per trait were Вс, = 

[-0.207 0.029 0.041 0.820 0.337 0.411], Re = kry pc efc, = 6.288, 
"T = A 

faa = © © 0.9056, апа Ё. = fe - 


(U^ MPa PAB, V Pc, B, 
[0.369 5.528 9.186 0.370 5.250 8.702] respectively. 

Fig. 8.2 presents the frequency distribution of the 244 estimated GESIM index 
values for one (Fig. 8.2a) and three traits (Fig. 8.2b) using the real data set described 
earlier. The frequency distribution of the estimated GESIM index values approaches 
the normal distribution for both indices. 

Now, we compare the estimated CLGSI and GESIM selection response and 
expected genetic gain per trait using the simulated data set described in Sect. 2.8.1 
of Chap. 2 for four phenotypic selection cycles, each with four traits (Ту, 75, Тз and 
Та), 500 genotypes, and four replicates per genotype. The economic weights of 7), 
T», Тз, and T4 were 1, —1, 1, and 1 respectively and the selection intensity for both 


a 
Frequency 
distribution for 
GESIM values for 1 
trait and 1 GEBV 


b 

Frequency 
destribution for 
GESIM values 
for 3 traits and 
3 GEBV 


Fig. 8.2 Frequency distribution of the 244 estimated genomic eigen selection index method 
(GESIM) values for the one-trait case (a) and for the three-trait case (b) for one selection cycle in 
an environment for a real maize (Zea mays) F population with 233 molecular markers. Note that 
the frequency distribution of the estimated GESIM index values approaches normal distribution for 
both indices 
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Table 8.3 Estimated combined linear genomic selection index (CLGSI) and estimated GESIM 
expected genetic gains for four traits (T1, T2, T3, and T4) and their associated genomic estimated 
breeding values (GEBV1, GEBV2, GEBV3, and GEBV4) for four simulated selection cycles. The 
selection intensity was 10% (k; = 1.755) 


Estimated CLGSI expected genetic gain 


Traits Genomic estimated breeding value 
Cycle Tl T2 T3 T4 GEBVI GEBV2 | GEBV3 GEBV4 
1 7.46 —3.69 |3.26 1.60 |7.28 —4.38 3.72 3.29 
2 7.08 —3.45 2.91 1.17 | 7.08 —3.63 3.66 2.67 
3 7.81 —3.51 2.06 |0.76 |7.30 —3.92 2:35 2.40 
4 7.46 —2.76 |2.48 | 0.81 6.84 —2.79 2.79 2.40 
Average | 7.45 =3,35 2.68 1.09 | 7.13 —3.68 3.13 2.69 


Estimated GESIM expected genetic gain 


Traits 
Cycle TI T2 T3 T4 


Genomic estimated breeding value 
GEBVI GEBV2  |GEBV3 


GEBV4 


Average 


indices was 10% (k; = 1.755). For this data set, matrix Е was an identity matrix of 
size 8 x 8 in all four selection cycles. 

For this data set, the averages of the estimated CLGSI and GESIM selection 
responses were 0.68 and 2.74 (Table 8.1) respectively. The estimated CLGSI 
selection response was lower than the estimated GESIM selection response. 
Table 8.3 presents the estimated CLGSI and GESIM expected genetic gain for 
four traits (T1, T2, T3, and T4) and their associated genomic estimated breeding 
values (GEBV1, GEBV2, GEBV3, and GEBV4) for four simulated selection cycles. 
The averages of the estimated CLGSI expected genetic gains for the four traits and 
their associated GEBVs were 7.45, —3.35, 2.68, 1.09, 7.13, —3.68, 3.13, and 2.69 
respectively, whereas the averages of the estimated GESIM expected genetic gains 
for the four traits and their associated GEBVs were 8.18, —3.08, 2.27, 0.71, 7.46, 
—3.53, 2.86, and 2.39 respectively. The estimated CLGSI and GESIM expected 
genetic gains per trait were very similar. 


8.3 The Genome-Wide Linear Eigen Selection Index 
Method 


The MESIM requires regressing phenotypic values on marker coded values to 
predict the marker score values for each individual candidate for selection, and 
then combining the marker scores with phenotypic information using the MESIM 
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to obtain a final prediction of the net genetic merit. In addition, the GESIM requires 
fitting of a statistical model to estimate all available marker effects in the training 
population; these estimates are then used to obtain GEBVs, which are predictors of 
breeding values. Crossa and Cerón-Rojas (2011) extended the ESIM theory to a 
genome-wide linear molecular ESIM (GW-ESIM) similar to the GW-LMSI 
described in Chap. 4. The GW-LMSI and GW-ESIM are very similar and only 
minor changes are necessary in GW-ESIM; for example, instead of estimating the 
GW-LMSI vector of coefficients according to the LPSI method (Chap. 2), the 
GW-ESIM vector of coefficients is estimated according to the singular value decom- 
position (SVD) described in Chap. 7. 


8.3.1 The GW-ESIM Parameters 


In the GW-ESIM context, the net genetic merit can be written as 
Н = wig + wm = [w| АН = wx, (8.24) 


where g' =[g, ... 8, | is the vector of true breeding values, f is the number of 
traits, LA = [ур -- wi]is the vector of unknown economic weights associated 
with the breeding values; w, =[0; --- Oy] is a null vector associated with the 
vector of marker code values m' = [ии --- тм], where m; (j = 1, 2, ..., 
N = number of markers) is the jth marker in the training population; 
w —[w; wj]andx-[g! mw]. The GW-ESIM (J) index combines the pheno- 
typic value and all the marker information of individuals to predict Eq. (8.24) values 
in each selection cycle and can be written as 


T= By Выт = [B, Bn] А = p'a. (8.25) 


where В, and fi, are vectors of phenotypic and marker weights respectively; 
у= [уу -- yx] is the vector of phenotypic values; m was defined in 
Eq. (8.24); В = [В, В, | ааа = [у m]. 

Let o? = В'ОВ and о2 = w'Zw be the variance of I = В'9 and Н = w’z respec- 


tively, and og; = w ЯВ the covariance between J and H, where О = Var x = 


Р Gy _ g| 1С Си А ; 
| Gu "d and X= var $ = си M are block matrices of size 


(t + №) x (t + N) (tis the number of traits and N is the number of markers) where 
P = Var(y), M = Var(m), C = Var(g), and Gy = cov (y,m) = cov (g, m) are 
covariance matrices of phenotypic (y), coded marker (m), and genetic (g) values 
respectively, whereas Gy, is the covariance matrix between у and m, and between 
g and m (for details see Chap. 4); w and f were defined earlier. Note that although the 
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size of matrices P and C are t x t, the sizes of matrices M and Gy are N x Мапа N x t 
respectively. Thus, if the number of markers is very high, the size of matrices M and 
Gy, could also be very high. 

In Chap. 4 we described matrix M as 


1 (1— 2011) ... (1—26y) 
M= (1 E 2621) : - E (1 = 20м) (8.26) 
(1—20m1) (1 20%) ... 1 
where (1 — 20;) and Ө„ (i, j= 1, 2, ..., N= number of markers) are the covariance 


(or correlation) and the recombination frequency between the ith and jth marker 
respectively, whereas matrix См can be written as 


(1 — 2rijauy (1 — 2r11)a12 ах (1 — 271м)олмо 
Gy — (1 — 2rn)an (1 — 222) а22 B * (1 = 2ran)a2No (8.27) 
(1 — 2ғ1)ал (1 = 2ry2)an "n (1 == 2гум)®м 


where (1 — 2да (i= 1, 2, ..., N, k= 1, 2, ..., No = number of quantitative trait 
loci (QTL), q = 1, 2, ..., £) is the covariance between the qth trait and the ith marker; 
гі is the recombination frequency between the ith and kth QTL, and a; is the effect 
of the kth QTL over the qth trait. 

Let pyr = FENG be the correlation between J = ff 'q and H = w'x; then, the 


GW-ESIM selection response can be written as 
R= kionpgr. (8.28) 


where Ку is the standardized selection differential (or selection intensity) associated 
with GW-ESIM and он = у w'Xw is the standard deviation of the variance of H. 
Assuming that Ку is fixed, and that matrices Q and X are known, we can maximize 
К (Eq. 8.28) by maximizing pg; with respect to vectors w' and В under the 
restrictions 8L = у Ху, с? = В'ОВ, and 0 < 65:07 < оо, similar to the MESIM 


and GESIM. It can be shown that vector w can be written as 


ww = ХОВ (8.29) 


and that Ну = ух is the net genetic merit in the GW-ESIM context. The 
correlation between Hw = wx and Г = В' is = MB$ and the 

Ww W Ва PHyI ОХ Op 
GW-ESIM vector of coefficients (В) that maximizes ру, can be obtained from 
equation 


(QZ — А+) Bw = 0, (8.30) 
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where Г, + м) is an identity matrix of size (t + №) x (t + М) and Гу = Bwa is the 
optimized GW-ESIM. The accuracy of the GW-ESIM can be written as 


Prun = V By QBy = A , (8.31) 
\/ PQX 'Qg, 28" 


where oj, = \/ВуОВу is the standard deviation of Iw = f yq. and on, 
\/ByQX 'QPy is the standard deviation of Hw = wx. In Eq. (8.30) 22, = 


Pity, 1,, 18 the square of the canonical correlation between Hw and Гу. 
The maximized GW-ESIM selection response and expected genetic gain per trait 


are 
Rw = kry ВОВ» (8.32) 


апа 


ХВ, 


К, 
NIIS 


respectively, where By is the first eigenvector of Eq. (8.30). 


Ey = (8.33) 


8.3.2 Estimating GW-ESIM Parameters 


In Chap. 2, Eqs. (2.22) to (2.24), we described the restricted maximum likelihood 
methods to estimate matrices C and P, which can be denoted by С and P. In 
Chap. 4, we described how to estimate matrices M and Сы, which can be denoted 
by MandG м. With these estimates, we constructed the block estimated matrices as 


o | P бы € G, 


"M | and X = , whence we obtained the equation 
Gu М Gu М 
(ОХ — 15,1), = 0, (8.34) 
j= 1, 2,..., (t + N), where (t + М) is Фе number of traits and markers in 


the GW-ESIM index. Similar to the MESIM, we obtained estimators fy, and 25, 
of the first eigenvector By, and the first eigenvalue a respectively, from equation 


(EE' —7i1)By, = 0, (8.35) 
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where E — Q X and И, = Я These results allow the GW-ESIM index selection 


response and its expected genetic gain per trait to be estimated as Tw = Виа, 
Rw =k En Bs and By =k; ХВ ively, whereas the esti 
w = kiy В у QBy, and E, = и у, whereas the estimator 
V Pw, ОВ, 


of GW-ESIM accuracy is Àm,. 


8.3.3 Numerical Examples 


We compare the estimated GW-LMSI and GW-ESIM selection responses using the 
simulated data set described in Sect. 2.8.1 of Chap. 2, with a selection intensity of 
10% (k; = 1.755). Table 8.1 presents the estimated GW-LMSI selection response for 
four simulated selection cycles when their vectors of coefficients are normalized, 
whence it can be seen that the average estimated GW-LMSI selection response was 
0.87. Table 8.1 also presents the estimated GW-ESIM selection response for four 
simulated selection cycles; the average of the estimated GW-ESIM selection 
responses was 0.93. Thus, for this data set, the estimated GW-LMSI and selection 
responses were very similar. 


8.4 The Restricted Linear Genomic Eigen Selection Index 
Method 


The restricted linear genomic eigen selection index method (RGESIM) is based on 
the restricted linear phenotypic ESIM (RESIM) theory described in Chap. 7. In the 
RESIM, the breeder's objective is to improve only (t — r) of t (r < t) traits, leaving 
r of them fixed. The same is true for RGESIM, but in this case, we should impose 2r 
restrictions, i.e., we need to fix r traits and their associated r GEBV to obtain results 
similar to those obtained with the RESIM (see Chap. 7 for details). This is the main 
difference between the RGESIM and the RESIM. 

It can be shown that Cov(/, a) = Af is the covariance between the breeding value 
vector (a’ = [2’ y']) and the GESIM index (J = f f f). In the RGESIM, we want 
some covariances between the linear combinations of a Uga) and I = B'f to be zero, 
i.e., Cov(Ic, Оса) = UAP = 0, where Ug is a matrix 2(t — 1) x 2t of 1s and 05 
(1 indicates that the trait and its associated GEBV are restricted, and 0 indicates that 
the trait and its GEBV have no restrictions). We can solve this problem by maxi- 

BAB with respect to vector В under the restriction U;Ap = 0 and В'В = 1 


mizin 
8 v pop 


similar to the RESIM, or by maximizing the correlation between H = w'a and 
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I= ptf, = АВ with respect to vectors w and В under the restrictions 


ОАВ = 0,07, = w'Aw, o; = ВФВ and 0 < оу, 07 < оо, as we did for the GESIM. 


8.4.1 The RGESIM Parameters 


To obtain the RGESIM vector of coefficients, we maximize the function 


‘AB 
v) = В — УО СА 8.36 
with respect to В and У’, where v' = [vi v2 +> vay — у] is a vector of Lagrange 


multipliers. The derivatives of function ДВ, v’) with respect to В and v' can be written 
as 


2(В'ФВ) AB — (В'ФВ) "^ (p/AB)bf — AUcv = 0, (8.37) 
ОАВ = 0, (8.38) 


respectively, where Eq. (8.38) denotes the restriction imposed for maximizing 
Eq. (8.36). Using algebraic methods on Eq. (8.37), we get 


(КьсФ ТА — A25 b;)fac = 0, (8.39) 


where 45 = М, hh, is the RGESIM heritability obtained under the restriction UG 
Ap = 0; К = [ly — Әһс], 1, is an identity matrix of size 2t x 2t, and 
Окс = ® AUG;(U,AD-'AUc) ‘UGA. When 0, is a null matrix, Bh, = Bi, 
(the vector of the GESIM coefficients); thus, the RGESIM is more general than 
the GESIM and includes the GESIM as a particular case. The RGESIM index /ск = 
Brgy and its selection response and expected genetic gain per trait use the first 
eigenvector of matrix КФ (А. It can be shown that the vector of coefficients of 
H = Wp in the RGESIM can be written as 


мкс = A^! [P + ОСА] Bro: (8.40) 


where Q4; = AUg(U;A®!AUg) ULAO-!. 

Note that the restriction UAB = 0 can be written as f'AUG = 0; this means that 
B’Q' rg = 0 and that the covariance between Нкс = Wego and [pg = ВьсЁ (онаст) 
can be written as 


OHgglgg = УвсАВьс = Pac Pac + Pac Qc CPac = Pac PPac- (8.41) 
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Equation (8.41) indicates that оно is equal to the variance of [кс = ВьсЁ 
(от = ВксФВьс); therefore, the maximized correlation between Jpg and Нрс or 
RGESIM accuracy can be written as 


VBro®Brco 
e el a 8.42 
нев ~~ / Wh AWRG (5:42) 


where ws; ÀWng is the variance of Hg. Hereafter, to simplify the notation, we write 
Eq. (8.42) as Arg. 
The maximized selection response and the expected genetic gain per trait of the 


RGESIM are 
Крс = kiy/ Pac Pac (8.43) 


and 
A 
Egg — -= 2оищ (8.44) 
ВкєФЁкс 


respectively, where Brg is the first eigenvector of matrix КФ А. 


6.4.2 Estimating RGESIM Parameters 


In Sect. 8.2, we indicated how to estimate matrices P, Г, and С using phenotypic and 


genomic information, whence we can estimate matrices A = гг and 
Ф = Ё E . Those methods are also valid for the RGESIM. This means that the 


SVD methods described for estimating MESIM parameters are also valid for esti- 
mating RGESIM parameters. 


8.4.3 Numerical Examples 


With a selection intensity of 10% (k; = 1.755), we compare the CRLGSI (for details 
see Chap. 6) versus the RGESIM theoretical results using a real maize (Zea mays) 
F, population with 244 genotypes (each with two repetitions), 233 molecular 
markers, and three traits—GY (ton ha^), EHT (cm), and PHT (cm)—described in 
0.45 1.33 2.33 n 0.07 0.61 1.06 
Sect. 8.2.2, where В = | 1.33 65.07 83.71 |, С= | 0.61 17.93 22.75 |, 
2.33 83.71 165.99 1.06 22.75 44.53 
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0.07 0.65 1.05 

and Г = |0.65 10.62 14.25 | were the estimated matrices of P, C, and Г 
1.05 14.25 26.37 

respectively. 

We have indicated that the main difference between the RLPSI and the CRLGSI 
is the matrix Ос, on which we now need to impose two restrictions: one for the trait 
and another for its associated GEBV. Consider the data set described earlier and 
suppose that we restrict the trait GY (ton Ба ') and its associated СЕВУсу; then, 
matrix Ос should be constructed as Uc, = | о | | | | ‚ If we restrict 


traits GY and ЕНТ (cm) and their associated СЕВУсу and СЕВУкнт, matrix Uc 


should be constructed as Ос, = , etc. The procedure for 


1 
0 
0 
obtaining matrices Kre = = [b, — Окс] апа Окс = = “AUG (UG AD 'AUc) E 
A was described in Chap. 6, and is also valid for о RGESIM parameters. 

The estimated CRLGSI vector of coefficients is Век = = Коб, where в = Ф 'А 
wis the estimated CLGSI vector of coefficients (Chap. 6). Letw' = [5 — 0.1 — un 
О 0 0] be the vector of economic weights and suppose that we restrict trait 
100000 
000 1 0 Of’ 
and according to matrices Р, С, and Г described earlier, Pos = 
[0.076 —0.004 —0.018 2.353 —0.096 —0.082] was the estimated 
CRLGSI vector of coefficients and the estimated CRLGSI was 


GY апа its associated СЕВУсу; in this case, UG = 


Ter = 0.076GY — 0.004EHT — 0.018РНТ + 2.353 СЕВУсу — 0.096СЕВУЕнт 
— 0.082GEBV pur 


where СЕВУсү, СЕВУкнт, and GEB Vpyr are the GEBVs associated with the traits 
GY, EHT, and PHT respectively. The same procedure is valid for two or more 
restrictions. 

The estimated CRLGSI selection response and expected genetic gain per 


/ 
5 ~ 2А E A 
trait were Аск = kry Bos fc, = 0.96 and = Et, = ky = —S 
V ВскФВ ск 
[0 —3.53 —6.03 0 -2.93 —4.87] respectively, whereas the estimated 


О1ск 


CRLGSI accuracy was P pieg = Lm — 0.51. Note that in Es the trait GY and its 


associated GEB Vcy have null values, as we would expect. 

The estimated | RGESIM vector of coefficients was Bor = 
[0.015 —0.001 —0.004 0.998 —0.029 —0.045], and the estimated 
RGESIM index was To = 0.015GY — 0.001ЕНТ — 0.004PHT + 0.998GEBVoy 
—0.029GEB Venr — 0.045GEBVpyr where СЕВУсу, GEBVgyr, and СЕВУрнт 


196 8 Linear Molecular and Genomic Eigen Selection Index Methods 


are the GEBVs associated with traits GY, EHT, and PHT respectively. The same 
procedure is valid for two or more restrictions. 
The estimated RGESIM selection response and expected genetic gain per 


1 
P =, mA д А 
trait were Крс = kr ВъсФВкс = 0.37 and Ekg = А = Рав 
V Bec ®Bre 
[0 —3.28 —6.00 0 —2.93 —5.40] respectively, whereas the estimated 


RGESIM accuracy was p= {к= 189 = 0.86. 
RG4 RG o~ 
Arc 


Fig. 8.3 presents the frequency distribution of the 244 estimated RGESIM index 
values for two null restrictions on traits GY and ЕНТ and their associated СЕВУсу 
and СЕВУкнт, for one selection cycle in an environment for а real maize (Zea mays) 
Е population with 233 molecular markers. Note that the frequency distribution of 
the estimated RGESIM index values approaches the normal distribution. 

Now we compare the estimated CRLGSI and RGESIM selection responses and 
expected genetic gains per trait using the simulated data set described in Sect. 2.8.1 
of Chap. 2. We used that data set for four phenotypic selection cycles (C2, C3, C4, 
and C5), each with four traits (Ту, 75, Тз, and T4), 500 genotypes, and four replicates 
per genotype. The economic weights for Tj, 75, Тз, and T4 were 1, —1, 1, and 


RGESIM frequency distribución values 


Fig. 8.3 Frequency distribution of the 244 estimated restricted genomic eigen selection index 
method (RGESIM) values for two null restrictions on traits grain yield (GY) and EHT and their 
associated genomic estimated breeding values (GEBVs), GEBV gy and GEB Vggrz respectively, for 
one selection cycle in an environment for a real maize (Zea mays) Е› population with 233 molecular 
markers. Note that the frequency distribution of the estimated RGESIM index values approaches 
normal distribution 
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1 respectively. For this data set, matrix F was an identity matrix of size 8 x 8 for all 
four selection cycles. 

Columns 2, 3, and 4 (from left to right) of Table 8.4 present the estimated 
CRLGSI selection responses when their vectors of coefficients are normalized and 
the estimated RGESIM and selection responses for one, two, and three restrictions 
for four simulated selection cycles. The averages of the estimated CRLGSI selection 
responses of the traits and their associated GEBVs for each of the three null 
restrictions were 3.24 for one restriction, 4.08 for two restrictions, and 5.06 for 
three restrictions, whereas the averages of the estimated RGESIM selection 
responses were 3.08 for one restriction, 2.79 for two restrictions, and 3.23 for 
three restrictions. Note that although for one restriction the selection response was 
similar for both indices, for two and three restrictions the CRLGSI selection 
responses were greater than the RGESIM selection responses. 

Table 8.5 presents the estimated CRLGSI and RGESIM expected genetic gains 
per trait for four traits (T1, T2, T3, and T4) and their associated GEBVs (in this case 
denoted by G1, G2, G3, and G4 to simplify the notation) in four simulated selection 
cycles and for one, two, and three null restrictions in four simulated selection cycles. 
Note that the null values of the traits and their restricted GEBVs are not shown in 
Table 8.5 with the aim of simplifying the table. The averages of the estimated 
CRLGSI expected genetic gains for the three traits and their associated GEBVs 
were —2.60, 2.16, 2.84, — 1.21, 0.67, and 1.02 for one restriction; 2.74, 3.23, 0.78, 


Table 8.4 Estimated combined null restricted linear genomic selection index (CRLGSI) and 
estimated combined predetermined proportional gain linear genomic selection index (CPPG- 
LGSJ) selection responses for one, two, and three restrictions when their vectors of coefficients 
are normalized for four simulated selection cycles 


CRLGSI response for one, two and CPPG-LGSI response for one, two and three 
three null restrictions predetermined restrictions 

Cycle 1 2 3 1 2 3 

1 3.25 4.09 4.89 5.36 2.80 1.81 

2 3.28 4.19 5.21 5.07 3.64 1.99 

3 2.91 3.89 4.97 5.37 3.86 1.42 

4 3:93 4.17 5.15 4.52 3.38 1.20 

Average 3.24 4.08 5.06 5.08 3.42 1.60 
RGESIM response for опе, two, and | PPG-GESIM response for опе, two, and 
three null restrictions three predetermined restrictions 

Cycle 1 2 3 1 2 3 

1 3.21 2.78 3.47 1.95 4.07 4.26 

2 3.11 2.86 3.06 1.85 4.12 5.49 

3 2.93 2.76 3.20 2.04 4.18 6.30 

4 3.07 2.76 3.21 2.02 4.17 5.82 

Average | 3.08 2.79 3.23 1.96 4.14 5.47 


Estimated null restricted genomic eigen selection index method (RGESIM) and predetermined 
proportional gain genomic eigen selection index method (PPG-GESIM) selection responses for one, 
two, and three restrictions for four simulated selection cycles. The selection intensity was 10% 
(ky = 1.755) 
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Table 8.5 Estimated CRLGSI and estimated null RGESIM expected genetic gains per trait for four 
traits (T1, T2, T3, and T4) and their associated genomic estimated breeding values (G1, G2, G3, and 
G4) for four simulated selection cycles and for one, two, and three null restrictions for four 
simulated selection cycles. The selection intensity was 10% (k, = 1.755) 


CRLGSI expected genetic gains for one, two and three null restrictions 


Three 
One restriction? Two restrictions” restrictions? 
Cycle | T2 T3 T4 G2 G3 |64 |T3 T4 |G3 |G4 |T4 G4 
1 —2.32 |2.17 |2.87 | —1.48 |0.73 |1.24 |2.60 |3.38 |0.86 |1.15 |4.08 | 1.50 
2 —2.76 |2.14 |2.89 | —1.19 (0.76 0.96 |2.81 13.30 0.87 |0.98 |3.95 | 1.25 
3 —2.22 |2.27 |2.98 | —1.15 |0.62 |0.97 |2.77 |3.14 |0.69 10.90 | 3.93 | 1.33 
4 —3.09 |2.08 |2.64 | —1.05 | 0.58 |0.92 |2.80 | 3.08 |0.70 |0.93 |4.13 | 1.24 


Mean | —2.60 |2.16 |2.84 | —1.21 | 0.67 | 1.02 | 2.74 |3.23 (0.78 |0.99 |4.02 | 1.33 
RGESIM expected genetic gains for one, two and three null restrictions 


Three 


Р restrictions? 


One restriction? Two restrictions 


3.27 | —1.52 | -1.24| 248 | —0.88| —1.00 


2 3.30 | -1.79 | -1.41| 2.10 | —1.09| —0.82 |3.26 | 1.34 |1.82 |0.66|3.41 |2.00 
3 2.98 | —1.62 | —1.44) 2.13 | —0.83| —0.75 |3.31 |0.86 |1.70 |0.21/3.45 |2.05 
4 3.56 | -1.73 | —1.23) 1.92 | —0.89| —0.78 |3.40 | 0.96 | 1.62 | 0.53} 3.58 | 2.02 


Mean | 3.27 | —1.67 | —1.33, 2.16 | —0.92| —0.84 


“All TI and G1 expected genetic gains were null 
All T1, T2, G1, and G2 expected genetic gains were null 
“All T1, T2, ТЗ, G1, G2, and G3 expected genetic gains were null 


3.29 | 1.02 | 1.76 |0.46 3.53 | 2.07 


and 0.99 for two restrictions; and 4.02 and 1.33 for three restrictions. On the other 
hand, the averages of the estimated RGESIM expected genetic gains for the three 
traits and their associated GEBVs were 3.27, —1.67, —1.33, 2.16, —0.92, and —0.84 
for one restriction; 3.29, 1.02, 1.76, and 0.46 for two restrictions; and 3.53 and 2.07 
for three restrictions. These results indicate that in terms of absolute values, the 
estimated expected genetic gains for the traits and their associated GEBVs were 
similar for both indices. 


8.5 The Predetermined Proportional Gain Linear Genomic 
Eigen Selection Index Method 


The predetermined proportional gain linear genomic eigen selection index method 
(PPG-GESIM) theory is based on the predetermined proportional gain linear phe- 
notypic ESIM (PPG-ESIM) described in Chap. 7. In the PPG-ESIM, the vector of 
PPG (predetermined proportional gain) imposed by the breeder was 
а = [di dj + а, . However, because the PPG-GESIM uses phenotypic 


and GEBV information jointly to predict the net genetic merit, the vector of PPG 
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imposed by the breeder (dpc) should be twice the standard vector d’, that is, 
dig = [dı d +++ d, ан duo © doy ] , Where we would expect that 
if dı is the PPG imposed on trait 1, then d, , | should be the PPG imposed on the 
GEBV associated with trait 1, etc. Thus, in the PPG-GESIM we have three possible 
options for determining (for each trait and GEBV) the PPG: e.g., for trait 1,d, = d, +1, 
а> d, , тога «d, , у. This is the main difference between the standard PPG-ESIM 
described in Chap. 7 and the PPG-GESIM. 


8.5.1 The PPG-GESIM Parameters 


Using the same procedure described for RGESIM and PPG-ESIM, the PPG-GESIM 
vector of coefficients (Вс), which maximizes the PPG-GESIM selection response 
and the expected genetic gain per trait, is the first eigenvector of the following 
equation 


(Tro — A25 b:) po = 9, (8.45) 


where Tp; = Kao "А + B, Kpc = [b, — Окс], b, is an identity matrix of 
size 21 x 2 Qro — &" AUc(ULAD-'AUc) ОА, B = бу, ô= 
‚ _ do; (UGA® ТАО) ОАФ А 
d, (ОАФ AUG) dr; 
When B is a null matrix, Тре = Krb А (matrix of the RGESIM), and when 
UG is a null matrix, Tpg = @~'A (matrix of the GESIM); this means that the 
PPG-GESIM includes the RGESIM and GESIM as particular cases. The opti- 
mized PPG-GESIM index can be written as Ipg = Bcf. 
The vector of coefficients of H = wpa in the PPG-GESIM can be written as 


©! AUc(U,A® AUG) аро, and 


weg = A^! [Ф + Qc A] Bro. (8.46) 
where Ош = AUGDc(D;U,AQ-'AUc;Dc) DGU,AO-!, аа 
id. ow 
p,—-|9 > v 9 -d |. similar to RGESIM, it can be shown that 
0 0 do d», — 42-1 


the covariance between Hro = Waat and Ipo =Bpgf (оны) is equal to 
| : j 
the variance of Ipg = ВьсЁ (от, = ВьсФВрс ), that is, оньс = ЖрсАВьс = 


1 — > 
ВсФВьс m Отис. 
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The maximized correlation between Jpg and Hpg, or PPG-GESIM accuracy, is 


УВьсФВьс 
= 7 8.47 
РНьс1ьс Wo Арс " А ( ) 


where үу „Арс is the variance of Hpg. Hereafter, to simplify the notation, we write 
Eq. (8.47) as Apc. 
The maximized selection response and the expected genetic gain per trait of the 


PPG-GESIM are 
Reg = ki Bog ®Bpg (8.48) 


and 


АВьс 
MUN TM 


respectively, where Врс is the first eigenvector of Eq. (8.45). 


Ерс = kj (8.49) 


8.5.2 Numerical Examples 


The process for estimating PPG-ESIM parameters is similar to the method 
described for estimating RGESIM parameters. With a selection intensity of 
10% (k, = 1.755), we compare the combined predetermined proportional 
gain linear genomic selection index (CPPG-LGSI) and PPG-GESIM results 
using the real maize (Zea mays) F5 population with 244 genotypes, 
233 molecular markers, and three traits—GY (ton һа '), ЕНТ (cm), and PHT 


_ [045 133 233] _ [0.07 0.61 1.06 
(cm)—where P = | 1.33 65.07 83.71 |, G = | 0.61 17.93 22.75 | and 
2.33 83.71 165.99 1.06 22.75 44.53 


" 0.07 0.65 1.05 
Г = 0.65 10.62 14.25 | are the estimated matrices of P, С, and Г respec- 
1.05 14.25 26.37 


tively, whereas w = [5 —0.1 —0.1 0 0 0] was the vector of economic 
weights. 
The estimated CPPG-LGSI vector of coefficients was f cp = Всс + Өсрё (see 
Chap. 6 for additional details). Let A — | 9 L| and Ф- |Р L| be the 
гг гг 


estimated block matrices and а»; =[7 —3 3.5 —1.5] the vector of PPG 
imposed by the breeder on the traits GY and EHT, and their associated genomic 
estimated breeding values (GEBVGy and  СЕВУкнт) and let 
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100000 
U= | Е | : | : be the matrix of null restrictions on the CPPG-LGSI 
000010 
and w'—[5 —0.1 —0.1 000] the vector of economic weights. It can be 
shown that Qc» — —0.00009 is the estimated value of the proportionality con- 
stant, 8! = [-112.92 —72.16 61.35 231.79 64.75 —61.35], Bop = 


[-0.01 0.01 —0.01 0.59 0.09 —0.09] is the estimated CPPG-LGSI vec- 
tor of coefficients, апа the estimated CPPG-LGSI can be written as 


Tcp = —0.01GY + 0.01EHT — 0.01PHT + 0.59GEB Voy + 0.09GEB Vgur 
— 0.09GEB Vpur 


where СЕВУсу, СЕВУЕкнт, and GEBVpyr are the GEBVs associated with traits 
GY, EHT, and PHT respectively. The same procedure is valid for more than 
two predetermined restrictions. The estimated CPPG-LGSI selection response 


and expected genetic gain per trait were Rep = kw Pesce = 0.443 and 


7 
ES A 
Ё. = k —PeeA [0.004 0002 —4.639 —0.002 0.001 —4.326] 
V ВсьФВсь 
81, 


respectively, whereas the estimated CPPG-LGSI accuracy is Pyr, = = = 0.234. 
OH 


Because the estimated value of the proportionality constant was negative (@сь = 
—0.00009), the expected genetic gains of the traits GY and EHT, and their associated 
genomic estimated breeding values (GEBVgy and СЕВУвнт), which appeared in 
the Е’, values, were not in accordance with the values of the vector of PPG imposed 
by the breeder, dpo = [7 —3 3.5 —1.5], as we would expect, and CPPG-LGSI 
accuracy (0.234) was low. These results indicate that in the CPPG-LGSI, it is very 
important for the estimated values of 8c» to be positive (see Chaps. 3 and 6 for 
details). 

In the PPG-GESIM, we need to find the solutions to equation (T PG — 2с) 


Во, = 0, for Яро, and flc, (see Eq. 8.45). The estimated PPG-GESIM vector of 
coefficients was Bhg = [0.001 —0.050 0.029 0.975 0.154 —0.157], which 


-01 0 0 0 0 0 
0 зо 0 0 0 
. | 0 02 0 0 0 : 
was transformed using matrix F — 0 00 -1 0 "E that is, we 
0 00 0 -1 0 


о 00 0 0 -l 
changed the direction of the original vector. With the Pic values, we can estimate the 
PPG-GESIM index as 
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Tpg = 0.001GY — 0.05EHT + 0.029PHT + 0.975СЕВУсу + 0.154GEBVenr 
— 0.157GEBVpyr 


where СЕВУсү, СЕВУвнт, and СЕВУрнт аге the GEBVs associated with the 
traits GY, EHT, and PHT respectively. The estimated PPG-GESIM selection 
response, accuracy, and expected genetic gain per trait were Арс = 


Я ФВ — = pg — w _ ВрсА =: 
ky ВъсФВьс = 0.696, px ~ = = = 0.843, and Е ьс = Кр = 
Hpgl pg 0^ =, I 
Нрс V ВъсФВ ьс 


[0.01 —1.00 —3.56 0 —0.46 — 3.98] respectively. 

Fig. 8.4 presents ће frequency distribution of the 244 estimated PPG-GESIM 
index values for two predetermined restrictions on the traits GY and EHT and their 
associated GEBVs (СЕВУсу and СЕВУЕкнт), for one selection cycle in an envi- 
ronment for a real maize (Zea mays) Е population with 233 molecular markers. Note 
that the frequency distribution of the estimated PPG-GESIM index values 
approaches normal distribution. 

Now, with a selection intensity of 10% (k; = 1.755) and a vector of predetermined 
restrictions dj; = [7 —3 5 3.5 —1.5 2.5], we compare the estimated 
CPPG-LGSI and PPG-GESIM selection responses and expected genetic gains per 


PPG-GESIM frequency distribution values 


Fig. 8.4 Frequency distribution of the 244 estimated predetermined proportional gain genomic 
eigen selection index method (PPG-GESIM) values for two predetermined restrictions on the traits 
GY апа ЕНТ and their associated GEBVs, GEBV gy and СЕВУкнт, for one selection cycle in ап 
environment for a real maize (Zea mays) Fz population with 233 molecular markers 
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trait using the simulated data set described in Sect. 2.8.1 of Chap. 2. Traits T1, T2, 
and T3 and their associated GEBVs (GEBV1, GEBV2, and GEBV3 respectively) 
were restricted, but trait T4 and its associated GEBV4 were not restricted. For 
this data set, matrix F was an identity matrix of size 8 x 8 for all four selection 
cycles. 

Table 8.6 presents the estimated CPPG-LGSI selection responses when their 
vectors of coefficients are normalized, and the estimated PPG-GESIM selection 
responses for one, two, and three predetermined restrictions for four simulated 
selection cycles. The averages of the estimated CPPG-LGSI selection responses 
were 5.08 for one restriction, 3.42 for two restrictions, and 1.60 for three restrictions, 
whereas the averages of the estimated PPG-GESIM selection responses were 1.96 
for one restriction, 4.14 for two restrictions, and 5.46 for three restrictions. For this 
data set, when the number of restrictions increases, the estimated CPPG-LGSI 


Table 8.6 Estimated CPPG-LGSI expected genetic gains for one, two, and three restricted 
predetermined traits (Т1, Т2, and ТЗ) and for one, two, and three restricted predetermined 
GEBVs (GEBV1, GEBV2, and GEBV3) for four simulated selection cycles 


CPPG-LGSI expected genetic gain for one predetermined restriction 


Traits Genomic estimated breeding values 
Cycle Tl T2 T3 T4 GEBVI GEBV2 |GEBV3_ | GEBV4 
1 8.24 —3.62 |3.32 |2.26 |4.12 —2.33 1.75 1.09 
2 7.98 —4.06 |3.03  |2.68 3.99 —2.24 1.79 1.04 
3 8.61 —448 | 3.24 1.96 | 4.30 —2.32 1.70 0.98 
4 8.30 —4.34 |3.32 |2.04 |4.15 —2.16 1.62 0.92 
Average  |8.28 —4.12 13.238 |2.23 1414 —2.26 1.71 1.01 
CPPG-LGSI expected genetic gain for two predetermined restrictions 
Traits Genomic estimated breeding values 
Cycle Tl T2 T3 T4 GEBVI GEBV2 |GEBV3_ | GEBV4 
1 8.06 —3.46 |3.30 |2.06 | 4.03 —1.73 1.72 0.98 
2 8.17 —3.50 |3.08 |2.65 |4.09 —1.75 1.79 0.98 
3 8.88 —3.81 3.31 1.83 | 4.44 —1.90 1.72 0.90 
4 8.61 —3.69 |3.43 1.99 | 4.30 —1.84 1.65 0.87 
Average 8.43 —3.61 328 12.13 |422 —1.81 1.72 0.93 


CPPG-LGSI expected genetic gain for three predetermined restrictions 


Traits 
Cycle Tl T2 T3 T4 


Genomic estimated breeding values 
GEBVI GEBV2 | GEBV3 


GEBV4 


5.68 —2.43 4.06 
3 5.87 =2.52 4.20 1.98 0.79 
4 5.91 —2.53 4.22 2.00 0.83 


Average | 5.81 —249 4.15 2.26 0.89 


The selection intensity was 10% (k, = 1.755) and the vector of predetermined restrictions was 
dj; = [7 -3 5 3.5 -1.5 2.5]. Trait Т4 and its associated GEBV4 were not restricted 
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selection response tends to decrease, whereas the estimated PPG-GESIM selection 
response increases. 

Tables 8.7 presents the estimated CPPG-LGSI and PPG-GESIM expected genetic 
gains for one, two, and three predetermined restrictions respectively, for four 
simulated selection cycles. The averages of the estimated CPPG-LGSI expected 
genetic gains for the four traits and their four associated GEBVs were 8.28, —4.12, 
3.23, 2.23, 4.14, —2.26, 1.71, and 1.01 for one restriction; 8.43, —3.61, 3.28, 2.13, 
4.22, —1.81, 1.72, and 0.93 for two restrictions; and 5.81, —2.49, 4.15, 2.26, 2.90, 
—1.24, 2.07, and 0.89 for three restrictions. On the other hand, the averages of the 
estimated PPG-GESIM expected genetic gains for the four traits and their four 
associated GEBVs were 6.97, —1.31, 1.78, 0.52, 5.64, —1.74, 1.75, and 0.58 for 
one restriction; 6.93, —2.73, 1.29, 0.85, 5.75, —2.55, 1.49, and 0.79 for two 
restrictions, and 8.12, —3.27, 2.99, 1.13, 2.19, —1.15, 1.30, and 0.45 for three 


Table 8.7 Estimated PPG-GESIM expected genetic gains for one, two, and three restricted traits 
(ТІ, T2, and ТЗ) and for one, two, and three restricted GEBVs (GEBV 1, GEBV2, and GEBV3) for 
four simulated selection cycles 


PPG-GESIM expected genetic gain for one predetermined restriction 


Traits Genomic estimated breeding values 
Cycle Tl T2 T3 T4 GEBVI GEBV2 |GEBV3_ | GEBV4 
1 6.89 —1.44 1.94 |0.63 | 6.36 —1.89 2.04 0.62 
2 6.71 —1.33 1.90 |0.65 | 6.06 —2.00 1.97 0.75 
3 7.09 —1.69 1.67 |040 |540 —1.72 1.63 0.55 
4 7.18 —0.78 1.58 |0.39 | 4.73 —1.34 1.35 0.39 
Average | 6.97 —1.31 1.78 |0.52 |5.64 —1.74 1.75 0.58 
PPG-GESIM expected genetic gain for two predetermined restrictions 
Traits Genomic estimated breeding values 
Cycle Tl T2 T3 T4 GEBVI GEBV2 |GEBV3_ | GEBV4 
1 6.61 —2.55 1.40 |0.94 | 6.49 —2.80 1.75 0.87 
2 5.67 —2.48 1.24 |0.87 | 6.16 —2.84 1.70 0.91 
3 7.35 —3.08 1.21 0.85 5.54 —2.49 1.37 0.82 
4 8.10 —2.80 1.20 |0.76  |4.80 —2.08 1.16 0.56 
Average | 6.93 —2.73 1.29 10.85 5.75 —2.55 1.49 0.79 
PPG-GESIM expected genetic gain for three predetermined restrictions 
Traits Genomic estimated breeding values 
Cycle Tl T2 T3 T4 GEBVI GEBV2 | СЕВУЗ | GEBV4 
1 7.21 —2.94 | 2.64 1.02 1.69 —1.10 1.07 0.45 
2 7.71 —2.97 |2.41 1.46 |222 —1.15 1.21 0.45 
3 8.72 —3.43 3.17 | 0.93 2.21 —1.06 1.34 0.42 
4 8.85 —3.73 3.72 1.09 | 2.63 —1.29 1.60 0.48 
Average | 8.12 =3.27 2:99 1.13 2.19 —1.15 1.30 0.45 


The selection intensity was 10% (k, = 1.755) and the vector of predetermined restrictions was 
dj; = [7 -3 5 3.5 -1.5 2.5]. Trait Т4 and its associated GEBV4 were not restricted 
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restrictions. These results indicate that the estimated CPPG-LGSI expected genetic 
gains for the four traits and their four associated GEBVs were generally higher than 
the estimated PPG-GESIM expected genetic gains for the four traits and their four 
associated GEBVs. 
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Chapter 9 м 
Multistage Linear Selection Indices m 


Abstract Multistage linear selection indices select individual traits available at 
different times or stages and are applied mainly in animals and tree breeding, where 
the traits under consideration become evident at different ages. The main indices are: 
the unrestricted, the restricted, and the predetermined proportional gain selection 
index. The restricted and predetermined proportional gain indices allow null and 
predetermined restrictions to be imposed on the trait expected genetic gain (or multi- 
trait selection response) values, whereas the rest of the traits remain changed without 
any restriction. The three indices can use phenotypic, genomic, or both sets of 
information to predict the unobservable net genetic merit values of the candidates 
for selection and all of them maximize the selection response, the expected genetic 
gain for each trait, have maximum accuracy, are the best predictor of the net genetic 
merit, and provide the breeder with an objective rule for evaluating and selecting 
several traits simultaneously. The theory of the foregoing indices is based on the 
independent culling method and on the linear phenotypic selection index, and is 
described in this chapter in the phenotypic and genomic selection context. Their 
theoretical results are validated in a two-stage breeding selection scheme using 
real and simulated data. 


9.1 Multistage Linear Phenotypic Selection Index 


In a similar manner to the linear phenotypic selection index (LPSI, Chap. 2), the 
objectives of the multistage linear phenotypic selection index (MLPSJ are: 


1. To predict the net genetic merit Н = w'g, where g' = [g1 g2 ... g;] is the vector 
of true breeding values of an individual for f traits and у” = [w1 wa ... wi] 
is the vector of economic weights. 

2. To select individuals with the highest H values at each stage as parents of the next 
generation. 

3. To maximize the MLPSI selection response and its expected genetic gain per 
trait. 
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4. To provide the breeder with an objective rule for evaluating and selecting several 
traits simultaneously. 


When selection is based on all the individual traits of interest jointly, the LPSI 


vector of coefficients that maximizes the selection response К = kv b'Pb and the 


expected genetic gain per trait E = k т is b = Р 'Су, where C and P are the 


covariance matrices of the true breeding values (g) and trait phenotypic values (y) 
respectively, and k is the selection intensity. In MLPSI terminology, the LPSI is 
called a one-stage selection index. The MLPSI is an extension of the LPSI theory to 
the multistage selection context and, as we shall see, the MLPSI theoretical results 
are very similar to the LPSI theoretical results described in Chap. 2. 


9.1.1 The MLPSI Parameters for Two Stages 


Lety’ = [у у» >+- y,]bea vector with ¢ traits of interest and suppose that we can 
select only n; of them (n; < t) at stage i (i= 1,2, · · -, N), such that after N stages (N < t), 


N 
У п; = t. Thus, for each stage we should have a selection index with a different 


i=l 
ni 
number of traits. For example, at stage i the index would be I; = У Буу, and at 
jel 
n n ny N 
stage N the index would be Iy = У biij + È оуу +: + È Вмум = У Г, 
j=l = j=l i=l 


where the double subscript of y,; indicates that the jth trait is measured at stage i, so 
that at each sub-index /;, all the n; traits are measured at the same age. 

Suppose that there are four traits of interest and that y’ = [уу y; уз уд |іѕ ће 
vector of observable phenotypic values and g = [g; g% 83 84 |15 the vector of 
unobservable breeding values. If at the first and second stages we select two traits, 
then n, = m = 2 and у’ can be partitioned as у = |x} xj], where x, = [у y2] 
andx, = [уз у; | are the vectors of traits that become evident at the first and second 
stages respectively. At the first stage, the phenotypic covariance matrix of x, (Pj) 
and the covariance matrix of x, with the vector of true breeding values g (С) can be 
Var(yı) Covi; y2) | _ь and 


ЕЮ Е Var(y;) 


abge ү ‚81) Cov(y 8). Cov(y;,83) Соул, 84) 

| Соу(уз, 81) Cov(y2,82) Cov(y2,83) Cov(y2, 84) 

respectively. For the second stage, in addition to matrix Pı, we need the phenotypic 
covariance matrix between x, and x» (P5) and the phenotypic covariance matrix of 
хә (Р); thus, the covariance matrix of phenotypic values at stage 2 is 
P; P» 

Pe Ё P; 
need the covariance between x» and g (G5); that is, at stage 2 the covariance matrix 


| In a similar manner, in addition to matrix Су, at stage 2 we 
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between phenotypic and breeding values can be written as G = E | . Matrices G 


G2 


Cov(x1, 8g) | 
Соу(хо, g) 


and C are not exactly the same, because although C = Var(g), G = | 


le | and this latter matrix changes at each stage. 
2 


Letw'— [wi w2 из ид be the vector of economic weights; then, at the first 
and second stages the MLPSI vectors of coefficients are b! = wG')P;' = 
[Би bi] and b) = w'G'P^! = [bzi by bz Р | respectively. The selection 


indices at stages 1 and 2 can be written as ЛП = diy, + Ь12у› —bixi and 
1» = bay, + Роу, + Розуз + b24y4 = Њу, which could be correlated and then 
numerical integration would be required to find optimal truncation points and 
selection intensities (Xu and Muir 1992; Hicks et al. 1998) before obtaining the 
maximized MLPSI selection response and expected genetic gain per trait. 

The accuracy of the MLPSI at stages 1 and 2 can be written as 


b, Pb; b)P*b; 
PHI, = aca and PHI; = се (9.1) 


respectively. Let kı апа & be the selection intensities for stages 1 and 2; then, the 
maximized MLPSI expected genetic gains per trait can be written as 


Gib, bjC* 
УР \/b5P* by 


and the total expected genetic gain per trait for the two stages is equal to E; + E2. Ina 
similar manner, the maximized selection responses for both stages are 


Ry = ki AJ b Pib, and Ro = Юд / bjP*b;, (9.3) 


and the total selection response for the two stages is R, + R2. In Eqs. (9.1) to (9.3), 
matrices P* and C* are matrices P and C respectively, adjusted for previous 
selection on J; = Бух. That is, the MLPSI accuracy, expected genetic gain рег 
trait, and selection response at stage 2 are affected by previous selection on Л 
(Saxton 1983) and it is necessary to adjust P and C. 

One method for adjusting matrices P and C has been provided by Cochran (1951) 
and Cunningham (1975). Suppose that X, Y, and W are three jointly normally 
distributed random variables and that the covariance among them is known, then 
the covariance between X and Y adjusted for the effects of selection on W can be 
obtained as 


Е =k and E= k (9.2) 


Cov(X, Y)* = Cov(X,Y) p ксн ave (9.4) 
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where и = (К — т), kı is the selection intensity at stage 1 and т is the truncation 
point when 7, = b} x; is applied. For example, if the selection intensity at the first 
stage is 5%, К; = 2.063, т = 1.645, and и = 0.862 (Falconer and Mackay 1996, 
Table A). 

According to Dekkers (2014), with the result of Eq. (9.4), it is possible to obtain 
matrices P* and C* using the following two equations: 


Cov(y, x1) bib} Cov(xi, y) 


Р* = Var(y)* =P 
(y) bi Var(xi )bi 


P, / 
[E Jonie, вы] 


=P 9.5 
3 БРЫ, ш] 
апа 
С* _ Var( )* =C „ Соу(в, Xi)bibi Соу(х, в) -C fbb Gi (9 6) 
B 8 B b! Var(xi)bi B b, Pb; : : 


With the Eq. (9.5) result, the correlation between J; = bix; and I = Њу is 


b [P] Pz № 


Jb Pibi bb, ^U 
where V b Pibi and y b, Pb; are the standard deviations of the variances of 


I; = bix; and J = Бу respectively. 


Corr(Ii,I2) = (9.7) 


9.1.2 The Selection Intensities 


Selection intensity k is related to the height of the ordinate of the normal curve (z) 
and the proportion selected (p) in the LPSI as k = z/p. In the multistage selection 
context, it is usual to fix the total proportion to be selected (p) before selection is 
carried out and then to determine the unknown proportion а; (i—1, 2,---, №) for each 
stage under the restriction 


N 
p=] [a (9.8) 
i=1 


where N is the number of stages. In the two-stage selection scheme, we would have 
р = 419. Based on the fixed proportion p and the |» value (Eq. 9.7), Young (1964) 
used the bivariate truncated normal distribution theory to obtain the selection 
intensity for two stages. A truncated distribution is a conditional distribution 
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resulting when the domain of the parent distribution is restricted to a smaller region 
(Hattaway 2010). In the multistage selection context, a truncation occurs when a 
sample of individuals from the parent distribution are selected as parents for the next 
selection cycle, thus creating a new population of individuals that follow a truncated 
normal distribution. 


Suppose that Г, = b| xı and /; = by have joint normal distribution and let Г, and 
1-6 


I, be transformed as v; = = 


h—p А . 
“and v; = TA with a mean of zero and a variance of 
1 2 


1, where и, and ду, are the means, whereas су, and от, are the standard deviations of 
the variances of J; and J, respectively. In this case, the method of selection is to retain 
animals or plants with v; > c; at stage 1 and v, + v2 > со at stage 2, where c, and со 
are truncation points for J; and J respectively. 

The selected population has bivariate left truncated normal distribution with a 
probability density function given by ћу, v2) = Ди, where 


1 1 
бя, v) = ехр [vi +v —2p;jvivo] р and ру) is the 
2л/1-рЬ | 20-058) 


correlation between у; and у». The fixed total proportion (р) before selection can 
оо оо 


be written as p = | | Ff (V1, v2)dv2dv,, where c, and со are truncation points for 
СІ 7—01] 

I, and Jy, respectively. Then, as р is fixed, Young (1964) integrated by parts (Thomas 

2014) 


| | / (v1, v2)dvi dv; (9.9) 
СІ С2— 11 

and found the expectations of v, and у» in the selected population, writing the 
selection intensity values for stages 1 (kı) and 2 (k2) as 


p, — £099 , :()00) VT + р) 12 (9.10) 


p p 


and 


—Pit(e)Q(a) | z(e3)Q(b) V (1 + p12)/2 
p p 


ky 


(9.11) 


respectively, where z(c1) = are the heights 


expl —0.5c? exp{ —0.5c2 
е) ee 
у2л у2л 
of the ordinates of the standard normal distribution at the lowest value of с and 
= апа р is the total proportion of the population of animal or plant 


3 Paton) 


— с1(1 : 
lines selected; a — ca — ЦИ + рш) and b = 4=2_ whereas Q(a) = 1 — Ф(а) 


^/ 1 — pi V 2(1—p13)" 
and Q(b) = 1 — (b) are the complement of the standard normal distribution; 
a 4 MEE 
O(a =f ——exp{—0.5w°}dw and ®(b = | ——=ехр{—0.57°}@йї are 
@=| "M } =| oz P } 
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probabilities of the standard normal distribution, i.e., Ф(а) = PW < a) and 
Ф(р) = PAT < b). 

Young (1964) provided figures to obtain values of cı and c» when the |» values 
are between —0.8 and 0.8, and the p values are between 0.05 and 0.8. For example, 
suppose that рух = 0.8 and p = 0.2 (or 20%), then, according to Young (1964, 
Fig. 9), c; = 0.80 and c» = 1.6, and to find the selection intensities for the first (А) 
and second stages (k2) we need to solve Eqs. (9.10) and (9.11). That is, as c, = 0.80, 
c& = 1.6, рә = 0.8, and р = 0.2, then (с) = се» = 0.290, 
ехр{ –0.5[01.6)2/2(1.8) 1.6—0.8(1.8 2(0.8)—1.6 
(сз) = 981 пе 028, а= EO = 0.27, p-X a =0, 
Ф(а) = 0.6064, Ф(р) = 0.5, O(a) = 1 — Ф(а) = 0.3936, and Q(b) = 1 — Ф(р) = 0.5. 
Based on these results, the selection intensities for stages 1 and 2 are 


p, = (0.29)(0-3936) | (028)(05)(09) лад and 
0.2 0.2 


pue (0.8) (0.29) (0.3936) " (0.28) (0.5) (0.9) — 0.721 
0.2 0.2 
respectively. Note that the values of Ф(а) = 0.6064 and ®(b) = 0.5 can be obtained 
from any table with values showing the area under the curve of the standard normal 
distribution (e.g., Rausand and Нфу!апа 2004, Table F.1). 

One problem with Eqs. (9.10) and (9.11) is that they tend to overestimate 
the selection intensities values and also overestimate the selection response 
when the total proportion retained p is lower than 10%. Cochran (1951) have 
given two equations to obtain selection intensities in the two stages context but his 
equations also overestimate the selection intensities values when p is lower than 
10%. Up to now, there is not an accurate method to estimate selection intensities for 
two or more stages in the MLPSI context. Mi et al. (2014) have developed an R 
package called selectiongain that enables calculation of the OMLPSI selection 
response for up to 20 selection stages. Selectiongain uses raw integration to obtain 
the first moment of a lower truncated multivariate standard normal distribution and 
then it estimates the OMLPSI selection response at each stage; however, this integral 
requires complex numerical algorithms with no convergence criteria (Arismendi 
2013) and could also overestimate the selection intensity at each stage. 


9.1.3 Numerical Example 


To illustrate the two-stage selection theory, we use the poultry data of Xu and Muir 
(1992). This data set contains four traits: age at sexual maturity, defined as the age 
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(in days) at which the first trap-nested egg was laid (y1); rate of lay, defined as 100 times 
(total eggs in the laying period)/(total days in the laying period) (y2); body weight 
(in pounds) measured at 32 weeks of age (уз); and average egg weight (in ounces per 
dozen) of all the eggs laid up to 32 weeks of age (у4). The estimated phenotypic and 
137.178 —90.957 0.136 0.564 
genetic covariance matrices were P= ое о 
0.136 1.103 0.202 0.104 


0.564 —1.231 0.104 2.874 


14.634 —18.356 —0.109 1.233 
ша @—|-18356 32.029 0.103 —2574| ау whereas 
=| —0109 0.103 0.089 0.023 P У 


1.233 —2.574 0.023 1.225 
the vector of economic weights for the four traits was w= 


[-3.555 19.536 —113.746 48.307]. 

purpose that at the first and second stages we select two traits (nı = nz = 2); then, 
y = = |x} xj] where x = [м y2] and x, = [уз уц]. The estimated phenotypic 
(Р, ) Ж genetic (Gi ) covariance matrices for the first stage were 


Ё = 137.178 Е | and | 14.634 —18.356 —0.109 nee 
—90.957 1.103 —18.356 32.029 0.103 —2.574 
respectively. For the first and second stages, the estimated MLPSI vector of 
coefficients were Б, — w'G';P,—[—0.918 2.339] аа b,=®CP = 
[-0.59 2.78 —49.45 3.75] respectively. 
The estimated correlation value эш the estimated indices Ti = bx, and 


bi [P, Р, | Б» 


V bi Pb, / b, Pb; 


were the estimated standard deviations of the variance of 7, and 75 respectively. 
Assuming that p — 0.2 (or 2096), an approximate selection intensity for the first 
stage was k, — 0.744, whence the estimated MLPSI selection response, expected 


Т, = Бу was Py = = 0.88, where b Pbi and Vb; Pb. 


genetic gain per trait, and accuracy were В, = ky \/Б,Р,Б, = 29.85, Е, = 


Gib, " Б/Р, 
kj — Am =[-1.046 1.702 0.006 —0.133], and Pm, = 4/22 = 0.353 
bi РЬ! w'Cw 
respectively. 
According to the № = 0.744 value, the approached value of и 


was u — 0.554, and by Eqs. (9.5) and (9.6), the estimated and adjusted phen- 
otypic (Б* ) and genetic (С* ) covariance matrices for the second stage were 
97.682 —26.241 0.422 0.168 
Б* — —26.241 95.518 0.634 —0.582 ала 
0.422 0.634 0.200 0.107 
0.168 —0.582 0.107 2.870 
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13.540 —16.575 —0.102 1.094 
e*- —16.575 29.129 0.092  —2.348 ЕВ 
—0.102 0.092 0.089 0.024 |’ ` 
1.094 —2.384 0.024 1.207 
For the second stage, the approximated selection intensity was ky = 0.721, 


whereas the estimated MLPSI selection response, expected genetic gain per trait 


" ps m C*b 
and accuracy, were Кэ = Кр, 4/ b5P5b2 = 24.84, Е, = К, —— = 
bP% b2 
Е БУРУБ. 
[—0.443 0.804 —0.087 —0.087], and рн, —4|——— = 0.314 respec- 
р w'C*w 


tively. Finally, the total estimated MLPSI selection response and expected 
genetic gain per trait were Ку + Ro = 54.69 and Е, - E) = 
[-1.488 2.506 —0.081 —0.219]. 


9.2 The Multistage Restricted Linear Phenotypic Selection 
Index 


The multistage restricted linear phenotypic selection index (MRLPSI) is an exten- 
sion of the null restricted linear phenotypic selection index (RLPSI) described in 
Chap. 3 to the multistage case; thus, the theoretical results of the MRLPSI are very 
similar to those of the RLPSI. The MRLPSI allows restrictions equal to zero to be 
imposed on the expected genetic gains of some traits, whereas other traits increase 
(or decrease) their expected genetic gains without any restrictions being imposed. 


9.2.1 The MRLPSI Parameters for Two Stages 


In Chap. 3, we indicated that vector by = Kb is a linear transformation of the LPSI 
vector of coefficients (b) made by the projector matrix K, and that matrix K is 
idempotent (K — K?) and projects b into a space smaller than the original space of b. 
The reduction of the space into which matrix K projects b is equal to the number of 
zeros that appears on the expected genetic gain per trait. Hence, the MRLPSI vector 
of coefficients for stages 1 and 2 should be a linear transformation of the MLPSI 
vector of coefficients at stages 1 (b; — Рг См) and 2 (b; = P^Cw) described in 
Sect. 9.1.1 of this chapter, and should be written as 


br, = Kibi (9.12) 
and 


br, = Kobo, (9.13) 
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respectively, where, at stage 1, Ki = Ш — Qi], ©, =P) Vi (ЧРИ) Y, 
V. = U'G’,, L is an identity matrix of the same size as Ру, and Ру! is the inverse of 
matrix Ру. At stage 2, Кә = 1 — Q2], О; = P^! V5 (YLP Ч) W^. V. = UCL 
is an identity matrix of the same size as P, and Р! is the inverse of matrix P. By 
Eqs. (9.12) and (9.13), the MRLPSI for stages 1 and 2 can be written as / = b, Xi 
and /› = b, y, where у’ = [х х, |; x, and х are the vectors of traits that become 
evident at the first and second stages respectively. 

Let k, and k be the selection intensities for stages 1 and 2 (Eqs. 9.10 and 9.11) 
respectively, and let P* and C* be the covariance matrices adjusted in the MRLPSI 
context according to Eqs. (9.5) and (9.5) respectively. The maximized MRLPSI 
selection response, expected genetic gain per trait, and accuracy at stages 1 and 2 can 


be written as 
Rn, = ki 4/ bp, Pibg, and Rr, = k2 bj, P*br,, (9.14) 


Gib br, C" 
Ey, = kı —— and Ey = k ——2 —— (9.15) 
br, Pibg, b}, P* br, 
and 
b} Pibg b} P* br, 
or = d wey PR = ист, (9.16) 


respectively, whereas the total MRLPSI selection response and expected genetic 
gain per trait for both stages are equal to Ар, + Rr, and Er, + Ep,. 


9.2.2 Numerical Examples 


To illustrate the MRLPSI theory for a two-stage selection breeding scheme, we 
use the real data set of the White Leghorn chickens of Hicks et al. (1998). This data 
set is conformed with six traits (уу to ye) that correspond to records consisting of the 
number of eggs laid during different periods: from week 0 through 4 (y1), 4 through 
8 (y2), 8 through 28 (y3), 28 through 32 (y4), 32 through 36 (ys), and 36 through 
52 (yg) respectively. The estimated phenotypic and genotypic covariance matrices 
were 


102 32 14 4 3 -1 44 11 —11 -3 -8 -3 
32 80 80 16 17 7 11 26 24 7 7 3 
5 14 80 298 78 112 62 А —11 24 62 23 37 20 
P= and C= , 
4 16 78 66 80 51 —3 7 23 14 23 14 
3 17 112 80 135 49 —8 7 37 23 42 25 
—1 7 62 51 49 98 —3 3 20 14 25 18 
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respectively, and w' = [0.08 0.08 0.38 0.08 0.08 0.31] was the vector of 
economic weights. 

Lety’=[y y2 Ys Y4 Ys Yolandg’=[g; 82 83 84 85 &|bethe 
vectors of observed phenotypic and unobserved genotypic values respectively, and 
suppose that at stage | we select four traits and at stage 2 we select two traits, then 
x, = [уу yo у; yg] апах, = [ys ус | are the vector of observations at stages 
1 and 2 respectively, whereas у’ = |x} х, | is the vector of total observations at 
stage 2. We need to estimate vectors be, = bj K} andb,, = b; K5, whereb, = wG', 
P;! and b; = wG'P'. In Chap. 3, we described methods of estimating matrices 
К = Ш - Qi]. Q 2 PL W; (Ир), V, = 067, K, = [b — Q4. 
О, = PY, (w,P-!w;) ҹи, and V^ — U'C, which are used in this subsection. 

At stage 1, the estimated phenotypic and genotypic covariance matrices were 


102 32 14 4 44 11 11 3 8 3 
f= 32 80 80 16 md divus 11 26 24 7 7 3 
14 80 298 78 : —11 24 62 23 37 20 
4 16 78 66 -3 7 23 14 22 14 


respectively. At both stages, traits y, and у» are restricted. Matrix U can 
. ; |1 00 0 0 0 : | 
be written as U = | 01000 | whence the estimated matrix of 
44 11 —11 -3 
11 26 24 7 
matrices of О, =P P (PP Y) Y and К, = Ш — О were 
0.923 —0.013 —0.511 —0.144 
^ _ 543 кгб \ wg _ | 0.164 1.026 1.093 0.317 
Ф. = Рр (РЕ) Ч = | ois —0.069 —0.001 —0.001| 219 
0.010 0.159 0.178 0.052 
0.077 0.013 0.511 0.144 
> д 0.164 —0.026 —1.093 —0.317 
а [Is i Qi] ~ | 0.145 0.069 1.001 0.001 
—0.010 —0.159 —0.178 0.948 
I, is an identity matrix of size 4 x 4. 

The estimated vector b, = Ы Кү was Ык = ЫК! = [0.044 —0.095 
0.0450.131], where b, = wG'/P;! = [-0.067 0.125 0.045 0.167], and 
Tr, = b'g, xi was the estimated MRLPSI at stage 1. The estimated MRLPSI vector of 
coefficients at stage 2 was b'g, = „К^ = [0.045 —0.068 0.028 —0.057 0.099 
0.106] and Zr, = b’r,y was the estimated MRLPSI at stage 2. 

The estimated correlation value (к) between IR, = b’r,X1 апа Ip, = b'r y was 
bg, [P, Рә Я 
МБ P iba, уБ. РБ, 
the estimated standard deviations of the variance of Tr, = Бах, and Tr = D'r y 
respectively. According to Young (1964, Fig. 8), and Eqs. (9.10) and (9.11), 
the selection intensities for stages 1 and 2 were К, = 0.641 and ky = 0.593 


restrictions was Ч = UG, = | | therefore, the estimated 


respectively, where 


Рк = 


= 0.564, where B'r Pb, and V b'a Pb, are 
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respectively. The estimated selection responses and expected genetic gains per traits 


for both stages were Rp, =k, \/b'p, Р.Б = 0.973 and 
Rr, = lo b'p,P* bp, = 0.930, 


/ 
= MAUL [0 0 1.271 0.870 1.482 0.974] and Е’к, = 


b/ s, Pib;, 
k — e =[0 0 1419 1.014 2.037 1.349], whereas Re, + Ва, = 1.903 
b/g,P*bg, 


and Ёк ФЕ =[0 0 2.691 1.884 3.519 2.322] were the total estimated 
MRLPSI selection response and expected genetic gain per trait respectively. 


~ 0 ара 
Finally, the estimated MRLPSI accuracy at stage 1 was pr, = E x = 
w'Cw 


b'5, P*bg, 

w'C*w 
can explain these results considering that although Pz, was obtained with six traits, 
Pr, Was obtained only with four traits, two of them restricted. 


0.320 and at stage 2 it was рр, = = 0.334. In this case, Pg, > Pr,- We 


9.3 The Multistage Predetermined Proportional Gain 
Linear Phenotypic Selection Index 


The main objectives of the multistage predetermined proportional gain linear phe- 
notypic selection index (MPPG-LPSI) are the same as those of the predetermined 
proportional gain linear phenotypic selection index (PPG-LPSI) described in 
Chap. 3, ie. to optimize, under some predetermined restrictions, the expected 
genetic gains per trait, to predict the net genetic merit, and to select the individual 
with the highest net genetic merit values as parents of the next generation under 
some predetermined restrictions. The MPPG-LPSI allows restrictions different from 
Zero to be imposed on the expected genetic gains of some traits, whereas other traits 
increase (or decrease) their expected genetic gains without any restrictions being 
imposed. 


9.3.1 The MPPG-LPSI Parameters 


In a similar manner to the MRLPSI, the MPPG-LPSI vector of coefficients for stages 
1 and 2 should be a linear transformation of the MLPSI vector of coefficients at 
stages 1 (bj = P, 'Giw) and 2 (b, = Р Су), and should be written as 
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bu, = Кур, (9.17) 
and 
bm, = Ky,b2, (9.18) 


respectively, where, at stage 1, Ky, = 19 — Ом | ‚Ом = РМ, (MjP;'M;) M, 

М, — D'V',, Ч = U'G',, I, is an identity matrix of the same size as Ру, and P; ! is 

the inverse of matrix Ру. At stage 2, Ky = [I — Qu], Qu = P '.М(МР-'М) М, 

М’ = D'Y, V = UC, Lis an identity matrix of the same size as P, P" ! is the inverse 
d. Q ee 0 -d 


0 d... 0 -—d» 


of matrix P, and D' — , Where d, (q = 1, 2..., r) is the q” 


0 0 © d, —d,1 
element of d' = [di d» --- а, |, the vector PPG (predetermined proportional gains) 
imposed by the breeder (see Chap. 3 for details). 

By Eqs. (9.17) and (9.18), the MPPG-LPSI for stages 1 and 2 can be written as 
Im, = by,X; and Гм, = by,y respectively, where, assuming that at stage 1 we select 
four traits and at stage 2 we select two traits, x} = [уу у у; y4] and x, = 
[ys ус | are the vectors of phenotypic observations at stages 1 and 2 respectively, 
and y = [x xj]is the vector of total phenotypic observations at stage 2. 

Let kı and kz be the selection intensities for stages 1 and 2 (Eqs. 9.10 and 9.11) 
respectively and let P* and C* be the adjusted matrices according to Eqs. (9.5) and 
(9.6) in the MPPG-LPSI context. Then, the MPPG-LPSI selection response and 
expected genetic gain per trait for both stages can be written as 


Ry, = ki 4/ by, Piby, and Rm, = ky by, P* bu, (9.19) 


and 


Жош 
апа Ey, = kı ——————, (9.20) 
by, Pibu, by, P* bu, 
respectively, whereas the total MPPG-LPSI selection response and expected genetic 
gain per trait for both stages are equal to Ry, + Ry, and Ey, + Ем, . In addition, the 
MPPG-LPSI accuracy for both stages can be written as 


(0 bj Pibm, 


bj, P*by, 
Рм, = w'Cw 


; 9.21 
wC*w vay) 


and py, = 
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9.3.2 Numerical Examples 


We use the real data set described in Sect. 9.2.2 to illustrate the theoretical results of 
the MPPG-LPSI in the same form as we did with those of the MRLPSI. We need to 
estimate vectors by, = bi Ky, and Dy, = bi Ky,» where b, = wG'P;! and 
b, = w'G'P^!. In Chap. 3 we have given methods to estimates Ky = [I — Ом, 
Qu = P 'M(M'P M) M, M' = DF, and V = ОС, which will be used in this 
subsection. 

The estimated phenotypic and genotypic covariance matrices at stage 1 were 


102 32 14 4 44 ll 11 3 8 3 
Б, – 32 80 80 16 and = 11 26 24 7 7 3 
14 80 298 78 -1124 62 23 37 20 
4 16 78 66 —3 7 23 14 22 14 


respectively, whereas w' = [0.08 0.08 0.38 0.08 0.08 0.31] was the vec- 
tor of economic weights. The traits restricted at both stages are y1, y2, and уз. The 


vector of PPG was d'—[2 3 5], whence D'— | | E: and 
100000 

ШО = |0 1 0 0 0 0 | were matrices О’ and U. The estimated matrices 
0010 0 0 


^ 242 7 —178 —61 
of M; and Ky, = [I — Ом, | were M = D'Y, = | 88 58 -66 —34 
0.176 0.205 0.606 0.159 
> 0.031 0.032 —0.007 0.199 А 5 А 
м, = | 0,195 0.235 0.852 —0.098 | TSPectively, where Wi = 0761. 
0.130 0.130 —0.098 0.940 
At stages 1 and 2, the estimated MPPG-LPSI vector of coefficients were b' М, 
—b'K', = [0.068 0.035 0.039 0.160] and bi = wG Pr! = 
[-0. 067 0.125 0.045 0. 167], whence the estimated MPPG-LGSI were 


1м: = =ы m,X and Т м, = =b my. The estimated correlation value (руу) between Т М, 
С) = ^ b y, [P, Р, Би, 


= b'm,Xı and Im, = buy was ру, = eat ae AS AS 
"T by, V Pu ,Pbu, 


y» b Mi P, b м, and yo Ьу; ‚РЬ m, Were the estimated standard deviations of variance of 


Ty, = = b m,X1 and Ти, = = b my respectively. According to Young (1964, Fig. 8), the 
selection intensities for stages 1 and 2 were kı = 0.744 and kp = 0.721 (Eqs. 9.10 and 
9.11) respectively. 

The estimated selection responses and expected genetic gains per traits for both 


stages were Ru, = == ki V b y, Piby, — 1.553 and Ry, = © V b y, P* by, = 1.401, 
y С’ by, 


E y, = k = 
V b' y, Pibm, 


| and 


— 0.870, where 


— [0.877 1.316 2.193 1.128 1.655 1.037], and 
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* bm, Е 


v b' uP * by, 
Ru, + Ви, = 2.954 апа Е +Е м, =[1.755 2.662 4.797 2.561 4.161 2.639] 
were the total estimated MPPGLPSI selection response and expected genetic gain 
per trait respectively. Note that the vector of predetermined restriction was 
Я =[2 3 5]. This means that the MPPG-LPSI efficiency at predicting the total 
expected genetic gain per trait was high because the difference between each 
predetermined value (2, 3, and 5) and the total of each predicted value (1.755, 
2.662, and 4.797) were 0.245, 0.338, and 0.203 respectively. 


[0.878 1.346 2.604 1.433 2.506 1.602], whereas 


Finally, the estimated MPPG-LPSI accuracy at stage 1 was py, = 


Бм,Р*Бм, 


w'C*w 


= 0.435, and at stage 2 it was py, = — 0.428; that is, both were very 


similar. 


9.4 The Multistage Linear Genomic Selection Index 


We describe the multistage linear genomic selection indices (MLGSI) as an exten- 
sion of the linear genomic selection index (LGSI, Chap. 5) theory to the multistage 
genomic selection context; thus, the theoretical results of the MLGSI are very similar 
to those of the LGSI. The MLGSI is a linear combination of genomic estimated 
breeding values (GEBVs) and is useful for predicting individual net genetic merit 
and for selecting individuals from a nonphenotyped testing population as parents of 
the next selection cycle. 


9.4.1 The MLGSI Parameters 


The objective of ће MLGSI is to predict the net genetic merit Н = w'g, where g isa 
vector of true breeding values and w’ is the vector of economic weights, using only 
GEBVs. In Chap. 5, we indicated that the covariance between ү; and g; is equal to the 
variance of y; i.e., Cov(g;, ү) = 52, and that the GEBV associated with the ith trait is 
a predictor of the ith vector of genomic breeding values (y;). In the testing popula- 
tion, the only observable information is w' and the GEBV associated with the traits 
of interest. For this reason, in practice, we construct a linear combination of GEBVs, 
which should be a good predictor of H — w'g. 

Suppose that the breeder is interested in four traits, and that 
Y=In 0 vs n) 8 =[81 8 83 84], and w—[wi wa из w4] 
are the vectors of genomic breeding values (y), true breeding values (g), and 


9.4 The Multistage Linear Genomic Selection Index 221 


2 
Sy 512 513 514 
S21. 52 53 $24 
economic weights (w) respectively. Let Г = Var(y) = 5 апа 
531 532 53 834 


2 
541 542 543 54 
2 
Oi 012 013 014 


O21 05 03 04 
С = (=) = 2 be the covariance matrix of g and y. At a 
631 032 03 034 
ба on 043 04 
two-stage selection breeding scheme, у’ = [ур у ys 74] can be partitioned 
into Y, = [у 72| and ү, = [уз 74|; therefore, at stage 1, Гу = Var(y,) = 


2 

> Е. is the genomic covariance matrix of Y, = |y; y2] and Cov(y,,g) = 
52 Sp 53 5 А А З : 

| p oe з | =A; is the covariance matrix of ү; = [у 72| with 
512 $5) 523 524 

g = [0] 8 83 54|. Matrix A, indicates that we are assuming that the 

covariance between y; and g; (i, j = 1, 2, ---, g; g= number of genotypes) is equal 

to the covariance between y; and y;. This is because, in practice, in the testing 

population, we can only estimate matrix Г. 

At stage 2, Г = Var(y) is the covariance matrix of y and А = Г is the covariance 
matrix of the vector of genomic breeding values ү with the vector of breeding values 
g. The MLGSI vector of coefficients at stages 1 and 2 ме В = 
мАг: = [811 Bo] and fj = wAT =w = [и wo ws w4] respec- 
tively, and the MLGSI for both stages can be written as / = fiy, + Ёру = fi 
Yı and р = му  Way2 + W373 + Муд = УТ. 

Let kı and kz be the MLGSI selection intensities for stages 1 and 2. For both 
stages, the MLGSI accuracies (ру, and py7,), expected genetic gains per trait (E; and 


E») and selection responses (А; and R2) can be written as 


pi Tif. wT*w 
Pun V Cw and ры, = wC*w (9.22) 
A'B T*w 
E, = k — = апа Е = k ——— (9.23) 
Г.В, vwT*w 
and 
К = А/В ГВ, and Ry =kVwT*w. (9.24) 


The total MLGSI expected genetic gain per trait and selection response at both 
stages are equal to E, + E» and R, + R2. To simplify notation, in Eqs. (9.23) and 
(9.24), we have omitted the intervals between stages or selection cycles (Lg). 
Matrices C* and Г* in Eqs. (9.22) to (9.23) are matrices Г and C adjusted for 
previous selection on Д. 

We adjust matrices Г and С for previous selection on /, as 
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B T ip, (9.25) 
1 
and 
G'bib/G, 
* 1 1 
= 2 
С С ГАЛЛ ; (9.26) 


respectively, where u = k,(k, — 7), k, is the standardized selection differential, and 7 
is the truncation point when J; = f) y, is applied. АП the terms in Eq. (9.26) were 
defined in Eq. (9.6). 

The correlation between J, = By, and I = w'y can be written as 


ВА» 


VID ww Ur 


where 4/ fT if, and V/w'Tw are the standard deviations of the variances of J; = В; 


y; and I, = w'y respectively. In Eq. (9.27), matrix Г was not adjusted according to 
Eq. (9.25). 


Corr(Ii, h) = (9.27) 


9.4.2 Estimating the Genomic Covariance Matrix 


All the MLGSI parameters are associated with matrix Г; thus, the estimation of this 
matrix in the testing population is very important. We estimate matrix Г according to 
the estimation method described in Chap. 5 (Eq. 5.25), that is, as 


Г, = Ты (9.28) 


x l X cjus "ET А : 
whereo, = 2 (Ta = їй) б; ! (Fyi — 18, ) is the estimated covariance between 
Tu = Ха, and Toi = Ха at stage l ог selection cycle of the testing population; g is 
the number of genotypes; By, and В, are the estimated arithmetic means of the 
values of ¥,, and Fg; 1 is an g x 1 vector of 1s and G; = c^! X;X; is the additive 
genomic relationship matrix at stage / or selection cycle in the testing population (see 


Chap. 5 for details). 
9.4.3 Numerical Examples 


We illustrate the MLGSI theoretical results using the data described in Chap. 2, 
Sect. 2.8.1 simulated for eight phenotypic and seven genomic selection cycles, 
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each with four traits (Ti, Т», T4 and T4), 500 genotypes, four replicates for 
each genotype, 2500 molecular markers, and 315 quantitative trait loci in one 
environment. The economic weights of Тү, Т», Тз, and T4 were 1, —1, 1, and 
1 respectively. In this subsection, and only for illustrative purposes, we use the 
data set from cycle 1. 

The genotypic and genomic estimated covariance matrices in cycle 1 were 


3621 —12.93 835 2.74 16.26 —6.51 5.60 229 
д 1212.93 13.04 —34 —224 4 |-6.51 579 —223 —1.62 
С= | 835 34 996 016 | ЧГ=| 560 223 375 094 

274  —224 0.16 6.64 229 —1.62 094 2.62 


respectively, whereas w —[1 —1 1 1] was the vector of economic weights. 
Matrices P and С were obtained according to Eqs. (2.22) to (2.24), whereas matrix 
Г was obtained according to Eq. (9.28). 
Suppose that we select two traits at stages 1 and 2. Then, at stage 1, 
f,- b г | РЕТТИ ЕЎ —6.51 5.60 2.29 | "ww 
6.51 5.79 ; 6.51 5.79 —2.33 —1.62 


estimated covariance matrices of Г, апа (Ai respectively, and the estimated 


MLGSI vector of coefficients was p 1 = мА 1 = [1.39 —1.25]. Because at 
stage 2 В, = МАГ” Ем = [им w из nt the estimated MLGSI vector of 
p Ау 


coefficients is the vector of economic weights. Thus, ур, = —— = 
p Г 1 В 1 V wIw 
0.97 was the estimated correlation between Г, = fT; and Т = W'T, and assuming 


that the fixed proportion was 0.2 (20%), К, = 0.744 and ky = 0.721 were 
the approximated selection intensities for stages 1 and 2 respectively. The 


adjusted matrices Г* and С” for previous selection on T= ЕВ, were 


7.96 —2.11 2.71 0.88 24.40 —5.65 5.47 1.39 
fue —2.11 3.46 —0.80 —0.87 апаб* = —5.65 8.55 —1.63 —1.41 
2.71 —0.80 2.75 0.45 5.47 —1.63 9.26 —0.17 
0.88 —0.87 0.45 2.38 1.39 —1.41 —0.17 6.49 


The estimated MLGSI accuracy, selection response, and expected genetic 


Tnm? 


gain for stage 1 in the testing population were P ui, = {/ —~— = 0.7], 
w'Cw 
Ri=h Pip, =5.90, and Ё =? АВ = [2.88 —1.53 1.00 0.49] 
В.Г Г.В, 


respectively, whereas at stage 2, the estimated MLGSI accuracy, selection response, 


ms oa = 
and expected genetic gain were P yy, = x = = 0.64, Rə = k; V wT * w = 4.10, 
А w'C*w 


= [1.74 —0.92 0.85 0.58] respectively. The estimated 


^ 
* 


Гм 
v wT*w 


MLGSI accuracy, selection response, and expected genetic gain at stage 2 were 


and Е, = ky 
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lower than at stage 1. This means that the adjusted matrices Г* and C* negatively 
affected the estimated MLPSI parameters at stage 2. The total estimated MLGSI 
selection response and expected genetic gain for stages | and 2 were К, +R, = 9.99 
and Е, +Е, = [4.62 —2.45 1.85 1.07]. 


9.5 The Multistage Restricted Linear Genomic Selection 
Index (MRLGSI) 


The restricted linear genomic selection index (RLGSI) described in Chap. 3 is 
extended to the multistage restricted linear genomic selection index (MRLGSD 
context in a two-stage breeding selection scheme. 


9.5.1 The MRLGSI Parameters 


In Sect. 9.4.1, we indicated that the MLGSI vector of coefficients at stage 1 can be 
written as p, = wA'T,! = [81 В] and at stage 2 as f, = WAT! = 
w = [ир wo из wa]. Е can be shown that the MRLGSI vector of coefficients 
is a linear transformation of vectors В, and В; made by matrix Kg, which is a 
projector (see Chaps. 3 and 6 for details) that projects f, and f) into a space smaller 
than the original space of В, and B2. Thus, at stages 1 and 2, the MRLGSI vector of 
coefficients is 


Br, = Kc fi (9.29) 


and 
Br, = Кс, В = Kc, w, (9.30) 


respectively, where Kc, — [1 — Ос | ‚Ос =U; (иг) UT), Кс, = I- Qs, | А 
and Qc, = U;( TU) Г are matrix projectors. By Eqs. (9.29) and (9.30), the 
MRLGSI at stages 1 and 2 can be written as Ip, = Br, y; and Ir, = PY respectively, 
where Y, = [уу y;]andy' —[y, уг Уз 74] are vectors of genomic breeding values, 
which can be estimated using GEBVs, as described in Chap. 5. In Chap. 6 we 
described methods for constructing matrix U’ and estimating matrix Kg; those 
methods are also valid in the MRLGSI context. 

In a similar manner to the MLGSI context, MRLGSI accuracies, expected genetic 
gains per trait, and selection responses for stages 1 and 2 in the testing population can 
be written as 
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Br TUS pj, T* Bp, 
Рт = wow 9" P = M etw e 


А’ 
Er, =k -Abu а Eg, = ko 


NDS 


T^f, 


MUS Br, 


(9.32) 


and 


Rr, = ku Be ifs, and Rr, = ko / BT Pr (9.33) 


respectively. The total MRLGSI expected genetic gain per trait and selection 
response for both stages are equal to Er, + Er, and Ар, + Rr,. To simplify the 
notation, in Eqs. (9.32) and (9.33), we have omitted the intervals between stages or 
selection cycles (Lg). Matrices Г* and С* in Eqs. (9.31) to (9.33) are matrices Г and 
С adjusted for previous selection. 

In the MRLGSI context, matrices Г* and C* can be obtained as 


A‘ fa Br, Ai 
Ге cu (9.34) 
Br Г! Br, 
and 
G' b; b, G, 
Сте C-u ес 9.35 
by, Pib, 999) 


where Вх, was defined in Eq. (9.29) and vector br, сап be obtained according to the 
RLPSI as described in Chap. 3. The term и = k(k — 7) was defined earlier. 
The correlation between Ip, = f), y, and Ig, = Бү can be written as 
_ Br, AiBr, 
V Bs, Fir, y D Ts, 


(9.36) 


Pry Tr, 


where у Be TUS and у Pr ГВк, are the standard deviations of the variances of 
Ig, = Pi and Ir, = Вь, Y respectively. In Eq. (9.36), matrix Г was not adjusted for 


previous selection оп Гв, = Pr Yi- 
9.5.2 Numerical Examples 


To illustrate the MRLGSI theory in a two-stage breeding selection scheme, we use 
the simulated data described in Sect. 9.4.3. In that subsection we indicated that the 
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estimated covariance matrices of Г, and A, were T, = | 


î 16.26 —6.51 5.60 2.29 

1 |—6.51 5.79 —2.33 —1.62 
was the estimated MLGSI vector of coefficients at stage 1. At stage 2, the estimated 
MLGSI vector of coefficients was w'=[|1 —1 1 1], the vector of economic 
weights. 

Suppose that we restrict only trait 2; then at stages 1 and 2, matrix Up; = [0 1] 
and matrix U, —[0 1 0 0] respectively. In addition, Qc, = 0. (ОЁ) ОР, 
Ос, =U, (u, fu) vf, Кс = I- 08], and Ко, = [I— 0] are the 
estimated matrices described in Eqs. (9.29) and (9.30) for stages 1 and 2. It can be 
shown that, at stages 1 and 2, P^. =f K}, =[1.39 1.558] and [A =wK'c, = 
[1.0 1.81 1.01.0] are the MRLGSI vectors of coefficients respectively. 

Suppose that the total proportion retained for the two stages was 20%, then at 
stage 1, kı = 0.744 is an associated approximated selection intensity and the 
estimated MRLGSI selection response, expected genetic gain per trait, and accuracy 


were Re, = м \/В Eis, = 3.083, E, = [2.225 0 0.742 0.117], and 
PS IB’ eI Br, 


16.26 —6.51 and 
—6.51 5.79 


‚ and that f^, - wA/T,! = [1.39 —1.25] 


Рн = = 0.370 respectively. The estimated MRLGSI expected 


w'Cw 
genetic gain, accuracy, апа selection response at stage 2 were 
: бр 
Ек, = о — = [1.156 0 0.793 0.536], Dy, = 

В к. *Br, 


апа Ry, —-hwyp mur — 2.485 respectively, where k; — 0.721 was the approx- 
imated selection intensity value for stage 2. 

The estimated total MRLGSI selection response and expected genetic gain at 
stages 1 and 2 were Rr, + Rr, = 5.568 and E}, + Е, = [3.380 0 1.535 0.653] 
respectively. Note that, in effect, the expected genetic gain for trait 2 was 0, as 
expected. 


9.6 The Multistage Predetermined Proportional Gain 
Linear Genomic Selection Index 


The MPPG-LGSI is an adaptation of the predetermined proportional gain linear 
genomic selection index (PPG-LGSI) described in Chap. 6; thus, the theoretical 
results, properties, and objectives of both indices are similar. The MPPG-LGSI 
objective is to change pg to ра + dq, where d, is a predetermined change in и. We 
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solve this problem by minimizing the mean squared difference between I = f) y and 
H — w'g (E[(H — D?p under the restriction ОТВ = Ocd, where Өс is a proportion- 
ality constant, а’ = [d, d>...d,] is the vector of predetermined restrictions, U’ is а 
matrix (t — 1) x t of 1s and Os, and Г is a covariance matrix of additive genomic 
breeding values, у’ = [yı yo...y;], where г is the number of predetermined restric- 
tions and ¢ the number of traits. 


9.61 The OMPPG-LGSI Parameters 


According to the results in Chap. 6, at stages 1 and 2, the MPPG-LGSI vector of 
coefficients can be written as 


Bp, = Br, + өй (UTU) "а (9.37) 


апа 


Bp, = Bp, + 02U2 (UjTU;) "а, (9.38) 


respectively, where Br, = Кс В), Br, = КоВ, = Ком, Ke, = [1 — Ос |, 
—1 —1 
Qc, = Ui (ГИ) ОГ, Ke, = 1 Qc], and Ос, = U(UjTU;) U,r 
were described in Eqs. (9.29) and (9.30). Also, it can be shown that the proportion- 
ality constants for stages 1 (0;) and 2 (05) are 
d'(UT,U;) ОА» а (ого) ОГУ 
Ом о шщ 


d'(UTU,) ‘d d'(U]FU;) 'd 


; (9.39) 


respectively. By Eqs. (9.37) to (9.39), the MPPG-LGSI for stages 1 and 2 can be 
written as Ip, = Вы Yı and Гр, = Pp Y respectively, where yı and y are vectors of 
genomic breeding values, which can be estimated using GEBVs (see Chap. 5 for 
details). 

For stages 1 and 2, the MPPG-LGSI accuracies (ону, and py;,), expected genetic 
gains per trait (Ep, and Ep,), and selection responses (Rp, and Rp,) can be written as 


Pp Tipp p», r* Вр 
Pan = | wc. ME Pun = ict ae 
А’ г" 
Ep, = ki Aia and Ep, = Ва (9.41) 
p» Tif», p», T* Bp, 


and 
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Rp, = ki 4/ Bp Ti Bp, and Rp, = k24 / ВьГ*Вь,, (9.42) 


respectively. The total MPPG-LGSI expected genetic gain per trait and selection 
response at both stages are equal to Ep, + Ep, and Rp, + Rp,. To simplify the 
notation, in Eqs. (9.41) and (9.42), we omitted the intervals between stages or 
selection cycles (Lg). Matrices Г* and С* are matrices Г and С adjusted for 
previous selection on Ip, according to Eqs. (9.34) and (9.35) respectively in the 
MPPG-LGSI context. 

The correlation between Ip, = Вр y, and Ip, = p, 'Y can be written as 


p, AiB,, 
р12 = ; ; : 
VB, ГІВ, V P, TP. 


In Eq. (9.43), matrix Г was not adjusted for previous selection on Ip, = Pp Yi 


(9.43) 


9.6.2 Numerical Examples 


To illustrate the MPPG-LGSI theory, we use the simulated data described in 
Sect. 9.4.3. Suppose that we select two traits at stages 1 and 2; then, at stage 1, 
В, — 16.26 —6.51 gud es res 16.26 —6.51 5.60 2.29 

P |-6.51 5.79 ! |-6.51 5.79 -233 -1.2 
estimated covariance matrices of Г; and A, respectively. We restricted trait 2 with 
а = — 2; then, at the stage 1 matrix 0; = [0 1] and at the stage 2 matrix 
U,—[0 1 0 0]. In addition, Ос, =U: (РО!) ОГ, Qc, =U: (V,U) Uf, 


ES 


are the 


Ko, = [I- 0с] , and Кс, = ГЕ Ов. | are the estimates of matrix projectors associated 
with stages 1 and 2 (Eqs. 9.37 and 9.38 for details). 

In Sect. 9.4.3, we showed that the estimated MRLGSI vector of coefficients 
for stage 1 was Bk, = p, Kt, = [1.386 1.550]. Thus, by Eq. (9.37), to obtain 
A A д A =i a A -1 
Bp, = Br, + 0:01 (иго) d, we only need to obtain 0; and U; (UTU) d, 

p 1 pr 
d (UTU) ОА у 

d'(U,U,) d 

| and 0, = 8.125; therefore, В —[1.39 —1.25] is the MPPG-LGSI 


where d = — 2 andô; = 


. It can be shown that U; (UTU) Г. 


0 
des Е” 
vector of coefficients at stage 1. 
Suppose that the total proportion retained for the two stages was 20%; then, 
kı = 0.744 is an approximate selection intensity associated with MPPG-LGSI and 
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the estimated MPPG-LGSI accuracy, selection response, and expected genetic gain 
EAE TUS 
w'Cw 


= [2.88 —1.53 1.00 0.49] respectively. 


= 0.71, Rp, = la Pp Fio, = 5.90 and 


at stage 1 were Py), = А 


A‘ Bp, 
v Bp Vif p, 
ES —1 a 
It can be shown that at stage 2, d' (ОГО) U,=[0 —0.345 0 0],0, = 
8.125 and Вр - w = [1 —1 1 1]. Thus, the estimated MPPG-LGSI accu- 
racy, selection response, and expected genetic gain at this stage were 
A q A = ©, Г" 
Ўн, = A ——. = 0.64, Rp, = ka V wE*w = 4.10, and E’p, = 5 = 
w'C*w VwT*w 
[1.74 —0.92 0.85 0.58] respectively, where ky = 0.721. The estimated total 
MPPG-LGSI selection response and expected genetic gain for both stages were Rp, 
+Rp, = 9.99andE,, + Ep, = [4.62 —2.45 1.85 1.07 | respectively. Note that 
the total expected genetic gain for trait 2 was —2.45, which is similar tod = — 2, the 
PPG imposed by the breeder. Finally, to simplify the notation, we omitted the 


intervals between stages or selection cycles (Lg) in the estimated MPPG-LPSI 
selection response and expected genetic gain for both stages. 


E, =k 
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Chapter 10 м 
Stochastic Simulation of Four Linear ciim 
Phenotypic Selection Indices 


Fernando H. Toledo, José Crossa, and Juan Burgueño 


Abstract Stochastic simulation can contribute to a better understanding of the 
problem, and has already been successfully applied to evaluate other breeding 
scenarios. Despite all the theories developed in this book concerning different 
types of indices, including phenotypic data and/or data on molecular markers, no 
examples have been presented showing the long-term behavior of different indices. 
The objective of this chapter is to present some results and insights into the in silico 
(computer simulation) performance comparison of over 50 selection cycles of a 
recurrent and generic population breeding program with different selection indices, 
restricted and unrestricted. The selection indices included in this stochastic simula- 
tion were the linear phenotypic selection index (LPSD, the eigen selection index 
method (ESIM), the restrictive LPSI, and the restrictive ESIM. 


10.1 Stochastic Simulation 


Simulations were used to evaluate the accuracy, effectiveness, response to selection, 
and the decrease in the overall genetic variance in a recurrent selection scheme under 
the use of the Smith (1936) and Hazel (1943) index (or linear phenotypic selection 
index, LPSI, see Chap. 2 for details); the eigen selection index method (ESIM, see 
Chap. 7 for details); the Kempthorne and Nordskog (1959) restricted index (K&N or 
restricted phenotypic selection index, RLPSI, see Chap. 3 for details); and the 
restricted eigen selection index method (RESIM, see Chap. 3 for details). The 
different scenarios are described below and encompass variations in the nature of 
the genetic correlation between traits in addition to their expected heritabilities. 
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232 10 Stochastic Simulation of Four Linear Phenotypic Selection Indices 
10.1.1 Breeding Design 


A total of 50 forward recurrent selection cycles of modern breeding were simulated, 
in which the breeder has the ability to select based on breeding value estimates of 
genetically correlated traits, and to apply the various above-mentioned selection 
indices. All simulated scenarios (described below) followed a common general 
breeding design. In each cycle, 350 full sib progenies (51) were generated taking 
700 parents at random from the base population. From each progeny, 100 double- 
haploid lines were randomly derived (which shortened the cycle interval by five 
inbreeding generations). The simulated phenotypic values of the 35,000 resulting 
lines were then evaluated in simulated trials. The selection was made by means of the 
progeny average performance. The selected progenies (top quarter) according to 
each index were then recombined by random mating a sample of the lines within the 
progeny to recover the population for the next cycle. 


10.1.2 Simulating Quantitative Traits 


Genetically correlated quantitative traits were simulated assuming a full pleiotropic 
model. This was carried out by randomly sampling genetic effects for all segregating 
sites from a multivariate normal distribution with zero mean and a previously stated 
variance—covariance. The genetic effects were in turn used to compute true breeding 
values (TBVs). An individual’s phenotype was obtained by taking its TBV and 
adding a zero mean normally random term with variance consistent with the 
expected heritability (А2) for the trait at which phenotyping occurred. The genetic 
variance in each cycle was calculated as the variance of the TBV of the individuals in 
that generation. However, it was expressed as relative values of the genetic variance 
in the initial cycle. The realized response to selection was also standardized in units 
of the genetic standard deviation in cycle 0. Cycle 0 was used as the base generation 
because it represents the available genetic variability, and also to observe, from the 
start, the genetic changes in future breeding generations. 

An empirical genome was considered comprising a set of 10 linkage groups 
(chromosomes), each 200 cM in length, and 1000 uniformly distributed segregating 
sites. To represent the historical evolution and recent breeding efforts up to the 
present day in addition to incorporating a steady state of known linkage disequilib- 
rium (LD) structure existing in crops, the starting populations (cycle 0) were taken 
after 200 generations of random mating within an effective population size of 1000 
segregating for all loci in which the allele frequency was 0.5. 

The in silico meiosis reflected the Mendelian laws of segregation for diploid 
species, by a count-location process that mimics the Haldane map function (Haldane 
1919). Thus, homologous chromosomes are paired into bivalents and recombined 
through randomly positioned chiasmata. The number of chiasmata follows a Poisson 
distribution, where the X parameter represents the chromosome length in Morgans 
and their positions are uniformly distributed, i.e., without interference between 
crossovers or any mutagenesis process. 
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10.1.3 Simulated Scenarios 


Three traits were considered, one with low heritability (the first, i = 0.2) and two 
with high heritability (the second and the third, h? = 0.5). The correlations between 
the first and second trait vary from positive (рс = 0.5) to negative (рс = —0.5). The 
third trait was always considered with segregation independent from the two others. 

The selection process involved two unrestricted indices: the LPSI (see Chap. 2), 
which ranks the progenies based on the average merit of their lines considering equal 
economic weights for all traits, and the ESIM (see Chap. 7), where the progenies 
were ranked in terms of ESIM values. Regarding the restricted selection indices, the 
RLPSI (or K&N) was employed (see Chap. 3) with equal economic weights for the 
traits in addition to the RESIM (see Chap. 7). Because of the restrictions, two 
different situations were evaluated in the latter cases, i.e., where the restrictions 
were applied for each of the first and second traits separately. 

Thus, all simulated scenarios encompass a three-way factorial: four selection 
procedures (the LPSI, the ESIM, the RLPSI or K&N, and the RESIM); two 
correlation scenarios, positive (9g = 0.5) and negative correlations (9g = —0.5) 
between the first and second trait; and two constraint situations, where the restric- 
tions were applied separately for the first and second traits. 

To simulate genetically correlated traits a full pleiotropic model was assumed. 
Gene effects were sampled from a multivariate normal distribution with zero mean 
and a previously stated variance-covariance matrix. In that sense it is possible to 
represent a quantitative and infinitesimal model. Each genes has its own effect 
varying according to a probabilistic density i.e., genes with positive and negative 
effects varying its effects sizes; alleles with large effects at lower frequency (major 
genes) and alleles with modest effects at higher frequency (minor genes). 


10.1.4  Inferences 


Results are presented as summaries of 100 Monte Carlo replicates for each scenario 
and include the response to selection, decreases in the genetic variance, selection 
accuracy, and observed heritabilities. The meiosis routine was implemented in C++, 
and compiled, linked, and through the facilities provided by the Rccp R package 
(Eddelbuettel 2013). All simulations were performed, analyzed, and summarized in 
R version 3.3.3 (R Development Core Team 2017). 


10.2 Results 


Overriding the results of the simulations regarding the four selection indices under 
the different trait genetic correlations and restrictions, scenarios are presented in 
terms of the consistency of the observed heritabilities of the traits; the response to 
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selection and changes in genetic variance for each trait; and the accuracy of the 
indices’ selection. 

First of all, the results show the stability of the Monte Carlo replicates in terms of 
possible deviations in the observed heritability from that expected, which in turn 
may affect further inferences (Table 10.1). The type I error (a) of the t test comparing 
expected and observed heritabilities for all simulated scenarios did not show impor- 
tant and significant departures. Slight departures that may be due to Monte Carlo 
error (P « 0.05) were found, namely: for both high and low heritability traits of the 
LPSI at cycle 5 when they were negatively correlated; for the independent trait also 
with the LPSI at cycle 50, but, when the other traits are positively correlated; for the 
high heritable trait at the first and last cycles, both under positive correlation in the 
ESIM and RESIM indices respectively; and for the low heritability trait in both 
restricted indices (RLPSI and RESIM) in cycles 0 and 5 for respective and negative 
and positive correlations. 

A complementary estimate of the power (type II error or р) of the tests was 
performed considering departures from the expected heritabilities of 196. It was 
verified that the average power if the observed estimates was around 70%, which 
reinforces the appropriateness of the simulation findings. 


10.2.1 Realized Genetic Gains 


Figure 10.1 shows the average genetic gains (expressed as standard deviations from 
the mean of cycle 0) for cycles 0—50 for the traits (low and high heritabilities and the 
independent trait); the four selection indices (unrestricted: LPSI and ESIM and 
restricted: RLPSI and RESIM) when the correlations are positive and negative. 

It is important to note that even after 50 recurrent cycles none of the scenarios has 
shown any indication that the selection plateau has been reached (Fig. 10.1). It is 
considered that even with the variation of the gains in the scenarios, there were 
increases in the merit of the target traits. Thus, the employment of selection indices is 
an effective way of achieving progress in long-term multi-trait selection. 

As expected, the unrestricted selection indices have shown genetic gains higher 
than their restricted counterparts (Fig. 10.1). It must be highlighted that the restric- 
tions proved their properties because when any trait was restricted, no gains were 
obtained for that trait (data not shown). The higher gains obtained with unrestricted 
indices is well known and justified in comparison with their restricted homologous 
because the net genetic merit is beneficiated by the gains in all traits, while, with 
gains constrained to zero in some traits, there are no indirect gains that may be 
highlighted especially because of positive correlations. 

The independent trait has presented the higher gains in comparison with the other 
traits for all correlation and selection process scenarios. The higher gains, however, 
were for the RESIM followed by the RLPSI in both positive and negative correla- 
tions (Fig. 10.1e and f). These findings may be understood both under the nature of 
the trait (independent inheritance) and over the properties of the restricted indices. 
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Fig. 10.1 Average genetic gains in 100 Monte Carlo replicates for traits with low and high 
heritability (h? 0.2 and 0.5) and independent along cycles 0—50 of a simulated selection given 
four indices, the linear phenotypic selection index (LPSI), the ESIM, the restricted linear phenotypic 
selection index (RLPSI), and the RESIM with positive (0.5) and negative (—0.5) correlations 
between the traits low A? and high А2. (a) Gains for the trait with low heritability when it is 
negatively correlated with the high heritability trait. (b) Gains for the trait with low heritability when 
it is positively correlated with the high heritability trait. (c) Gains for the trait with high heritability 
when it is negatively correlated with the low heritability trait. (d) Gains for the trait with high 
heritability when it is negatively correlated with the low heritability trait. (e) Gains for the 
independent trait when the other traits are negatively correlated. (f) Gains for the independent 
trait when the other traits are positively correlated 


As the third trait becomes independent from the others, there are no indirect effects 
owing to the constraints in the gains of the other traits. With regard to the technical 
features of the RESIM, it must be emphasized that because of the eigen decompo- 
sition, the largest eigenvector obtains higher weight from the most variable trait and 
consequently ends in distinct gains, which in this case is the independent trait. 

The Smith (or LPSI) and ESIM produce similar genetic gains for highly heritable 
traits when the genetic correlations are positive (Fig. 10.1d). The ESIM is simply 
another way of obtaining the LPSI based on the eigen decomposition theory, which 
avoids the assignment of economic weights. Thus, the results prove that the same 
results may be found with both indices. However, the ESIM is the preferred index 
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because of its advantages over the LPSI: no subjective decision for selecting 
economic weights, and better statistical sampling properties. 

When the traits are negatively correlated, the trait with greater heritability has 
shown important realized genetic gains based on the ESIM and similar gains for the 
LPSI and its restricted analogous, 1.е., the RLPSI (Fig. 10.1а and с). In addition, 
when traits are negatively correlated, restricting the traits with low heritability is an 
alternative, to ensure similar progress to the use of unrestricted indices for highly 
heritable traits. On the contrary, it is also interesting to note that the ESIM has the 
worst performance when the traits are negatively correlated for trait with lower 
heritability (Fig. 10. 1a). 

On the other hand, as already pointed out, the ESIM performance surpasses all the 
others with regard to the highly heritable trait (Fig. 10.1с and а). The reason for this 
is similar to the above-mentioned regarding the properties of the eigen decomposi- 
tion. When the first trait is negatively correlated with the second one, heavier weight 
is given to the trait with higher heritability than to the trait with low heritability. 
However, when the traits are positively correlated, synergic and indirect effects 
increase both traits, one positively affecting the other. 

When the traits are positively correlated but with low heritability, the LPSI and 
the ESIM have similar realized genetic gains until cycle 25; after this selection cycle, 
the LPSI is superior to the ESIM (Fig. 10.1b). In this case, the two restrictive indices, 
the RLPSI and the RESIM, are given lower realized genetic gains than the LPSI and 
the ESIM (Fig. 10.1b). Finally, considering the third trait (the independent one), the 
RESIM provides the greater realized genetic gains (Fig. 10.1е and f). 


10.2.2 Genetic Variances 


In Fig. 10.2, the average relative decreases in the genetic variances along the 
50 cycles of selection for the three traits (with low and high heritability traits in 
addition to the independent trait) under the selection system given by the four 
selection indices, restricted (the RLPSI and the RESIM) and unrestricted (the LPSI 
and the ESIM), both with negative and positive correlations between the first and 
second traits. 

As a general result, it is clear that after selection there were decreases in the 
genetic variance along the recurrent cycles (Fig. 10.2). From the most conservative 
decrease (around 40% in Fig. 10.2a and b) to the sharp decrease (close to 10% in 
Fig. 10.2e and f) and in contrast to the trends in genetic gains, it is possible to 
conceive that the genetic variability was not yet exhausted by selection. This 
observation endorses what was said regarding the effectiveness of the selection 
indices as a criterion for long-term multi-trait selection. 

As expected, the restricted indices are more conservative, maintaining greater 
genetic variance (Fig. 10.2). Their feature is to prevent the restricted trait from 
changing its genetic merit. Thus, they tend to keep its genetic variance unchanged, 
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Fig. 10.2 Average genetic variances in 100 Monte Carlo replicates for traits with low and high 
heritability (1? 0.2 and 0.5) and independent along cycles 0-50 of a simulated selection given four 
selection indices, the LPSI, the ESIM, the RLPSI, and the RESIM, with positive (0.5) and negative 
(—0.5) correlations between the traits low I? and high K. (a) Genetic variance of the low heritability 
trait when it is negatively correlated with the high heritability trait. (b) Genetic variance of the low 
heritability trait when it is positively correlated with the high heritability trait. (c) Genetic variance 
of the high heritability trait when it is negatively correlated with the low heritability trait. (d) 
Genetic variance of the high heritability trait when it is negatively correlated with the low 
heritability trait. (e) Genetic variance of the independent trait when the other traits are negatively 
correlated. (f) Genetic variance of the independent trait when the other traits are positively 
correlated 


which is reflected in the lower decreases in the genetic variance, even under the 
indirect effects of the other traits. 

It should be noted that there was a slight increase in variance in the short term 
(up to cycle 3) for the trait with lower heritability when negatively correlated with 
the highly heritable one (Fig. 10.2a and b). This is an outcome of the changes in 
allele frequencies of the first trait due to the indirect effects of the second trait and/or 
the release of genetic disequilibrium owing to the assortative mating of the individ- 
uals given higher weights regarding the second trait (highly heritable). 

Reflecting the findings regarding the genetic gains (Fig. 10.1), the trait with 
strong decreases in genetic variance on average was the one in which the response 
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to selection was more pronounced, i.e., the independent trait (Fig. 10.2e and f). This 
trait has shown stronger decreases over the selection through the ESIM index in both 
positive and negative correlation scenarios. As mentioned before, as the third trait is 
independent of the others, a greater response to selection was achieved in that trait 
and consequently strong changes in allele frequencies, which drove the decreases in 
genetic variance. 

When the heritability is high, it is easy to differentiate the trends in the decrease in 
the genetic variance between restricted and unrestricted indices (Fig. 10.2c). It is 
more evident, especially when the traits are positively correlated (Fig. 10.2d). Thus, 
the ESIM has the highest decreases followed by the LPSI. Nevertheless, for the traits 
with low heritability, the decreases in genetic variance are indistinguishable between 
the indices, showing that the effectiveness of the response to selection is a function 
of the heritability (Fig. 10.2a and Ъ). 


10.2.3 Selection Accuracy 


The accuracy of the selection was measured as the square root of the correlation 
between the net genetic merit and the estimated linear function of each index. 
Figure 10.3 shows the absolute accuracies (left axis) and relative values in relation 
to the mean accuracy of the first cycle (right axis) for all indices in both negative 
(Fig. 10.3a) and positive (Fig. 10.3b) correlation scenarios. 

In all cases, a reduction in the selection precision of all the indices was observed. 
The effect of selection is the improvement in the genetic merit of the traits by means 
of changes in allele frequencies that also affect/decrease the genetic variance. 
However, as a side effect, the selection becomes harder and has lower precision. 

The LPSI has shown greater accuracy in comparison with the other indices in any 
situation (Fig. 10.3a and b). Its main feature is precisely maximizing the correlation 
between the net genetic merit and the linear combination of the trait. It may be 
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Fig. 10.3 Average absolute and relative accuracy of selection in 100 Monte Carlo replicates for 
traits with low and high heritability (h°) and independent along cycles 0-50 of a simulated selection 
given four selection indices, the LPSI, the ESIM, the RLPSI, and the RESIM with positive and 
negative correlations between the traits low Л” and high А2 


References 241 


argued that the ESIM also does that; however, only when the phenotypic and 
genotypic variances and covariances are known are they the best linear predictors. 
Thus, according to what was found, it is possible to note that the ESIM was more 
affected by the sampling properties when estimating matrices of variance and 
covariance (Fig. 10.3а). 

For the scenario with positive correlations, the differences between the two types 
of indices, the restricted ones and the unrestricted ones, were clear, as the 
unrestricted indices have shown greater selection accuracy (Fig. 10.3b). This reflects 
the fact that the restricted index constrains the gains by means of restrictions in the 
correlation between the net genetic merit and the linear combination of the traits. 
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Chapter 11 A) 
RIndSel: Selection Indices with R Sheek for 


Gregorio Alvarado, Angela Pacheco, Sergio Pérez-Elizalde, Juan Burgueño, 
and Francisco M. Rodríguez 


Abstract RIndSel is a graphical unit interface that uses selection index theory to 
select individual candidates as parents for the next selection cycle. The index can be 
a linear combination of phenotypic values, genomic estimated breeding values, or a 
linear combination of phenotypic values and marker scores. Based on the restriction 
imposed on the expected genetic gain per trait, the index can be unrestricted, null 
restricted, or predetermined proportional gain indices. RIndSel is compatible with 
any of the following versions of Windows: XP, 7, 8, and 10. Furthermore, it can be 
installed on 32-bit and 64-bit computers. In the context of fixed and mixed models, 
RIndSel estimates the phenotypic and genetic covariance using two main experi- 
mental designs: randomized complete block design and lattice or alpha lattice 
design. In the following, we explain how RIndSel can be used to determine indi- 
vidual candidates as parents for the next cycle of improvement. 


11.1 Background 


The linear selection index theory (see Chaps. 2 to 9 for details) can be difficult to 
apply without the use of specific codes developed in statistical analysis system 
(SAS) software. At the International Maize and Wheat Improvement Center 
(CIMMYT, for its Spanish acronym), codes were developed in SAS software 
version 9.4 (SAS institute 2017) that can help to determine individuals as parents 
for the next selection cycle. The SAS codes can be found at the following link: 
https://data.cimmyt.org/dataset.xhtml?persistentId—hdl: 1 1529/10242. 
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Afterward, the SAS codes were translated to R language as scripts (Pacheco 
et al. 2017) and denoted by RIndSel (R software to analyze Selection Indices), with 
the objective of creating a user-friendly graphical unit interface (СОТ) in JAVA. 
The link to download the software is: https://data.cimmyt.org/dataset.xhtml? 
persistentId—hdl:11529/10854. 


11.2 Requirements, Installation, and Opening 


RIndSel is compatible with a Windows platform, in any of the following versions: 
XP, 7, 8, and 10; furthermore, it can be installed on 32-bit and 64-bit computers. To 
install RIndSel on a computer, the user must double-click on the executable file 
downloaded over the link given above and then follow the instructions that appear in 
the installation box. Once RIndSel has been installed, it can be opened by: 


1. Double-clicking on the shortcut located in the desktop. 

2. Locating it in the Windows menu and clicking. 

3. Locating the software via the pathway C:/RIndSel, and double-clicking on 
RIndSel.exe. 


As we shall see, the software has been partitioned into two modules. 


11.3 First Module: Data Reading and Helping 


This module (Fig. 11.1) deploys two small boxes upper left denoted by “Open File" 
and “Help.” With Open File, the user may access a set of files where he/she can 
open, for example, the file of phenotypic data, which should contain information 
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Fig. 11.1 Module for reading data 


11.3 First Module: Data Reading and Helping 245 


mu s - м ^ 
sta Prenctypes-adan Base Dcsv - De 
la X Calibri quus — Ld 
Fw p [BI 2:д: Eë 
fx Leo Data, Chapter? 34 Oct22-17 » data chapter 4 
A 8 с D L 
REP Block ENTRY GY EKT PHT NOD 
2 1 14 1 0.477964 22.03589 21.09234 
3 1 19 2 0.977964 -13.4641 -16.4077 
4 1 2 3 -0.02204 6.835887 -2.20766 
Е 1 16 4 0.877964 14.63589 17.39234 
6 1 15 5] -0.22204 -5.16411 -12.7077 
7 1 17 6 1.377964 8.135887 18.29234 . 
1 8 7 1177964 15.13589 6.092339 
1 3 8 -0.02204 3.535887 2.792339 sity nsus because cf the current 
1 7 9 0.977964 2.135887 -3.20766 
1 7 10 2.177964 13.03589 13.89234 
1 25 11 0.077964 12.53589 14.39234 
1 6 12 -1.02204 13.53589 16.09234 
1 2 13 -0.72204 -4.86411 -5.20766 
1 21 14 -122204 3.835887 1.892339 
25 15 0.877964 16.83589 22.79234 
1 16 16 -0.02204 7.535887 -0.70766 
1 3 17 -0.02204 -4.66411 -0.70766 
1 4 18 1.277964 18.83589 7.392339 а 
1 4 19 -0.72204 -0.66411 -5.40766 
1 25 20 -0.42204 3.135887 10.39234 
1 24 21 -0.52204 6.635887 12.79234 
1 24 22 -0.22204 1.835887 -12.9077 
1 19 23 .0.72004 1.138447. „4.70066 


Data Phenotypes-Ibadan Base 2 


Fig. 11.2 Steps for saving a comma delimited file 


associated with the experimental design. This file contains information about the 
field book where the experimental design variables can be identified in the first 
columns, whereas the remaining columns contain information about traits measured 
in the field; design variables and traits are connected by the plot number. Previously, 
the data set should have been captured in a spreadsheet using Excel or any other 
similar software and saved as a comma delimited file. To save the data as a comma 
delimited file in Excel, the following steps should be taken. In the Excel file that 
contains the data set (Fig. 11.2), select from the main menu: FILE — Save As — 
Browser View Options (look for the path were the data will be saved) — Save as 
type (look for CSV, comma separated values). The end of the file name should be “. 
csv," indicating that the file is ready to be used. 

The small box “Help” (Fig. 11.1) shows basic features such as the installation 
manual and software licenses. The installation manual provides a brief description of 
the selection indices that can be calculated and the pathway to where the software is 
located (Fig. 11.3). Furthermore, it shows folders related to the software features 
such as how the software could be used. There is also a folder called “Examples,” 
where the user can find data for test phenotypic selection indices, selection indices of 
coded score markers, and wide genome selection indices. The folders “Lib” and 
"Programs" contain information related to the software functioning; therefore, the 
authors highly recommend not modifying these folders. 
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Fig. 11.3 Tree diagram of the RIndSel structure 


11.4 Second Module: Capturing Parameters to Run 


Once the data have been read (first module), RIndSel moves to the second module 
(Fig. 11.4), where some feedback is required: 


1. To choose the selection index to calculate. 

2. To select the experimental design. 

3. To identify the variables of experimental design. 

4. To choose the traits that will be used to calculate the selection index in the 
data file. 


This module is structured in such a way that calculating any selection index is 
relatively easy. There are three other small buttons located upper left of the module: 
“Back,” “Analyze,” and “Help.” Back returns to the previous module (Fig. 11.1), 
Analyze executes and calculates the selection index, and Help provides the same 
functions as described in the previous section. In addition, there are four windows, 
each of which must be filled with the correct parameters. The first one is related to 
the indices that RIndSel is able to calculate (Fig. 11.5). 


11.5 Selection Index 


In this menu, it is necessary to define the percentage of genotypes that will be 
selected. By default, it is 5%, but any other percentage can be chosen. RIndSel uses 
the correlation matrix or the variance-covariance matrix to obtain the index; how- 
ever, by default, the variance-covariance matrix is used. To work with the correla- 
tion matrix box, “Correlation” should be checked. The sign for “economic weights” 
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Fig. 11.4 RIndSel module of analysis 


can be used to determine the behavior of the expected genetic gain of the traits. For 
example, with —1, the mean of the traits tends to decrease, whereas with 1, it 
increases. It is also possible to use the trait heritability. The economic weights can 
be assigned by creating a comma-delimited file with the name of the trait and 
economic weight sign (Fig. 11.62). Once the file has been created, it can be browsed 
by pressing the open button and where the *.csv file is located (Fig. 11.6b). 

To calculate the restricted linear phenotypic selection index (RLPSI or K&N, see 
Chap. 3 for details), it is necessary to create the same file and incorporate an 
additional column called “Restrictions.” This last column must be filled with the 
number one for those traits that remain fixed (restricted) and zeros for those traits that 
change (Fig. 11.7). An additional option is to ignore the "Weights" box, which 
means that RIndSel automatically presents an Excel file covering the options for 
capturing economic weights; the only requirement is that the file must be saved as a 
comma delimited file. 


11.6 Experimental Design 


The menu allows the user to select the field array design to be used. There are two 
choices: 


1. Lattice or alpha-lattice 
2. Random complete block designs 
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Fig. 11.6 Example of content for (a) economic weights of (b) file location 
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Fig. 11.7 Economic weights for restricted selection indices 


11.7 Variable Selection 


Experimental design is strongly related to the “Variable Selection" menu, where it 
is possible to identify the variables that constitute the experimental design. Thus, 
we can choose variables that match with the "Location," replicate for random 
complete block design and block, provided that we have a lattice or alpha-lattice 
experiment. 


11.8 Response Variables 


In this menu, the user can select traits to be used to calculate the selection index. It 
can be activated by clicking on the trait to be selected. Figure 11.8 shows an 
example of how this window must be filled when a Smith phenotypic selection 
index is calculated. 
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Fig. 11.8 Example of parameters that could be used to calculate a phenotypic selection index 


11.9 Molecular Selection Indices 


If the selection index to be calculated is molecular, such as the Lande and Thompson 
(1990) or the linear molecular selection index (Fig. 11.9, and see Table 1.1, Chap. 1, 
for details), two additional files are required: 


1. Whole molecular markers matrix (green arrow). 
2. Marker scores or estimated quantitative trait loci values (red arrow). 


Marker scores can be obtained by making a regression of the phenotypic values 
on a codified molecular markers matrix (see Chap. 4 for details). The file can be 
created in Excel and must have the score with its respective marker for each trait; this 
file is saved with a .csv extension. An example of how these kinds of files must be 
generated is shown in Fig. 11.10a. 

To calculate the scores in an F2 population, it is important for the molecular 
marker to have previously been codified as — 1, 0, and 1 for genotypes aa, Aa, and 
AA respectively. When data come from an recombinant inbred line population, the 
molecular marker should be codified as — 1 and 1 for homozygous genotype aa and 
AA respectively. In the genomic selection indices (LGSI) context (see Chap. 5 for 
details), it is only necessary to codify the molecular marker matrix (Fig. 11.10b), as 
these indices do not require a marker score. 
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Fig. 11.9 Example of parameters that could be used to calculate a molecular selection index 
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Fig. 11.10 Comma delimited files read in Excel for (a) scores of markers for traits plant height 
(PHT) and ear height (EHT), (b) a codified molecular marker matrix 


11.10 How to Use RIndSel 


The use of RIndSel can be illustrated with an example from the Smith linear 
phenotypic selection index (LPST) (Smith 1936, see Chap. 2 for details). Figure 11.11 
shows the phenotypic data (Fig. 11.11a), together with the file of economic weights 
(Fig. 11.11b). Three simulated traits (T1, T2, and T3) described in Chap. 2 were 
used. T1 and T3 are positive (economic value — 1), whereas trait T2 is negative 
(economic value — — 1). It is important to remember that all data files must be saved 
in comma delimited format (*.csv). 

After the data and economic weights files have been generated, the data need to be 
loaded into RIndSel; thus, it is important to be able to find the pathway to where the 
files are located (e.g., *C://Book/datafile/C1 PSI 05 Phen.csv"). Once the data file 
has been located, it must be uploaded, which can be done by clicking on the file, 
causing it to automatically begin this process. It is then possible go to the second 
module (Fig. 11.12) and select subsequent parameters from the menus. In this case, 
Selection Index: Smith; Percent: 5; Weights: here we must look for where the 
economic weights are, for example *C://Book/datafile/C1 PSI 05 Phen Weights. 
csv." Once this file has been located, it must be selected by clicking. 
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Fig. 11.11 Simulated data from Chap. 2 with (a) array in an alpha-lattice and (b) economic weights 
required to test the Smith linear phenotypic selection index (LPST) 
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Fig. 11.12 Example of filling in a phenotypic selection index without restrictions 
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After the selection index windows are filled, the following menu is called: 
Experimental design, which allows the user to select the appropriate design — (for 
example, a lattice). To select the design variables, the user must navigate to the 
Variable Selection. In this example, the experiment has only one location, and the 
following should be selected: rep as Replicate, block as Block and entry as 
Genotype. An output name of the index must be assigned by writing its name in 
the Box Output folder, which is below the Variable Selection menu. For the Smith 
LPSI, the name chosen was SmithSimulated. Finally, the Response Variables menu 
should be filled by selecting the traits T1, T2, and T3. 


11.11 RIndSel Output 


This section explains the structure of the RIndSel output. First, RIndSel presents the 
genotypic variance-covariance matrix and the phenotypic variance—covariance 
matrix (Table 11.1). In addition, when the selection index involves molecular data, 
RIndSel presents an additional molecular variance-covariance matrix, which con- 
tains the additive variability associated with the markers (Table 11.2). 

RIndSel also presents a table with the estimated values of the index parameters 
(Table 11.3). These estimates are the covariance of the selection index, the variance 
of the selection index, the net genetic merit (breeding value), the correlation between 
the selection index and the net genetic merit, the selection response, and the 
heritability of the index (see Chap. 2 for additional details). 

Additional results are presented in Table 11.4, which show the ranked selected 
individuals; this ranking was done as a function of the estimated selection index 
values. Table 11.4 also presents the means of the traits of the selected individuals; 
the means of the traits of the total population; the selection differential (see Chap. 2), 


Table 11.1 Matrices of TOWnames TI T2 T3 
variance—covariance deployed : Е : 
Genetic covariance matrix 
by RIndSel 
T1 3621 —12.93 8.35 
T2 —12.93 13.04 —3.40 
T3 8.35 —3.40 9.96 
Phenotypic covariance matrix 
Tl 62.50 —12.74 8.53 
T2 —12.74 17.52 —3.38 
T3 8.53 —3.38 12.31 
Table 11.2 Molecular TOWRUmES TI T2 T3 
covariance matrix TI 62.50 —1274 853 
T2 —12.74 17.52 —3.38 
T3 8.53 —3.38 12.31 
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Table 11.3 Estimated selection index parameters given by the RIndSel output 


Parameter Output 

Covariance between the selection index and the breeding value 86.7185 
Variance of the selection index 86.7185 
Variance of the breeding value 108.5746 
Correlation between the selection index and the breeding value 0.8937 
Response to selection 16.3431 
Heritability 0.8168 


Table 11.4 Values of the three traits for selected individuals and the values of the Smith linear 


phenotypic selection index, means and gains with k = 5% 


rownames TI T2 T3 Index 
Entry 353 189.68 38.16 36.13 103.97 
Entry 370 178.27 34.38 37.79 103.45 
Entry 480 174.84 42.72 45.12 100.66 
Entry 300 177.38 39.15 40.34 100.65 
Entry 273 181.18 35.94 35.14 100.52 
Entry 275 167.94 36.82 42.20 99.92 
Entry 148 173.37 37.07 39.62 99.86 
Entry 137 185.48 46.48 42.55 99.77 
Entry 351 173.79 38.38 40.52 99.68 
Entry 236 182.85 37.88 34.96 99.20 
Entry 217 175.13 38.48 39.16 98.84 
Entry 356 171.09 39.60 41.98 98.47 
Entry 167 175.39 38.73 37.73 97.17 
Entry 230 169.73 37.10 38.69 96.80 
Entry 243 171.90 41.53 41.45 96.29 
Entry 55 170.02 36.92 37.76 96.15 
Entry 68 172.56 37.18 36.70 96.13 
Entry 36 175.80 38.86 36.34 95.75 
Entry 164 173.61 38.37 36.42 95.14 
Entry 140 170.53 42.52 41.97 95.05 
Entry 146 177.40 39.64 35.50 94.89 
Entry 432 174.01 40.73 38.26 94.84 
Entry 378 176.62 42.69 38.47 94.44 
Entry 288 172.14 39.37 37.26 94.23 
Entry 386 175.77 42.89 38.81 94.13 
Mean of selected individuals 175.46 39.26 38.83 

Mean of all individuals 161.88 45.19 34.39 

Selection differential 13.58 —5.92 4.44 

Expected genetic gain 5% 9.51 —5.48 4.22 
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ник 115 on 20 values ОЁ -ownames T1 T2 T3 Index 
ae es index Entry 1 ш 3203 160 8080 
> Enty 2 144.39 |5077 |3465 63.82 
traits are analyzed Entry 3 157.48 48.04 37.90 77.52 
Entry 4 167.30 47.98 30.49 74.97 
Entry 5 164.11 49.89 32.03 72.85 
Entry 6 166.26 40.44 29.93 81.81 
Entry 7 154.59 52.22 30.31 63.22 
Entry 8 160.00 42.91 31.23 77.12 
Entry 9 158.51 46.32 34.52 76.25 
Entry 10 163.63 45.43 35.73 81.35 
Entry 11 156.16 46.75 35.58 75.62 
Entry 12 171.38 41.17 35.13 89.52 
Entry 13 153.17 54.18 36.23 66.79 
Entry 14 149.89 52.33 31.13 61.39 
Entry 15 159.63 49.01 31.72 70.96 
Entry 16 160.70 42.51 32.99 79.85 
Entry 17 157.07 45.49 28.40 69.68 
Entry 18 167.50 41.69 36.73 88.55 
Entry 19 159.17 50.60 36.25 73.93 
Entry 20 161.80 46.58 37.33 80.85 


and the expected genetic gain per trait. Selected individuals can be identified by the 
first column called *rownames," as columns 2 to 4 contain the best linear and 
unbiased estimator for each mean trait. Finally, column 5 presents the estimated 
selection index values. 

Comparison between means of selected individuals and all individuals is done by 
selection differential, where in general traits whose economic weight was 1 are 
positive, whereas those traits whose economic weight was —1 are negative. The 
expected genetic gain is an inferential tool based on normal distribution that depends 
on the percentage of selected individuals and gives the estimated index expected 
genetic gain per trait. 

Finally, Table 11.5 shows the best linear and unbiased estimators for all individ- 
uals accompanied by its respective selection index. In this case, only the first 
20 individuals were included. This table output is important, because on some 
occasions, it is necessary to determine the specific behavior of a group of genotypes 
that may not have a good performance, even though they have shown a good general 
performance from previous analyses. Another possibility is that a group of individ- 
uals belongs to a specific population group; thus, it is possible to select the best 
individual for this population group. 
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